Content area
Full Text
Articial Intelligence Review 22: 85126, 2004.
2004 Kluwer Academic Publishers. Printed in the Netherlands.85A Survey of Outlier Detection MethodologiesVICTORIA J. HODGE & JIM AUSTINDepartment of Computer Science, University of York, York, YO10 5DD UK
(E-mail: fvicky, [email protected])Abstract. Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults,
changes in system behaviour, fraudulent behaviour, human error, instrument error or
simply through natural deviations in populations. Their detection can identify system
faults and fraud before they escalate with potentially catastrophic consequences. It can
identify errors and remove their contaminating eect on the data set and as such to
purify the data for processing. The original outlier detection methods were arbitrary but
now, principled and systematic techniques are used, drawn from the full gamut of
Computer Science and Statistics. In this paper, we introduce a survey of contemporary
techniques for outlier detection. We identify their respective motivations and distinguish
their advantages and disadvantages in a comparative review.Keywords: anomaly, detection, deviation, noise, novelty, outlier, recognition1. IntroductionOutlier detection encompasses aspects of a broad spectrum of techniques. Many techniques employed for detecting outliers are fundamentally identical but with dierent names chosen by the authors. For
example, authors describe their various approaches as outlier detection,
novelty detection, anomaly detection, noise detection, deviation detection or exception mining. In this paper, we have chosen to call the
technique outlier detection although we also use novelty detection where
we feel appropriate but we incorporate approaches from all ve categories named above. Additionally, authors have proposed many denitions for an outlier with seemingly no universally accepted denition.
We will take the denition of Grubbs (1969) and quoted in Barnett and
Lewis (1994).An outlying observation, or outlier, is one that appears to deviate
markedly from other members of the sample in which it occurs.86A further outlier denition from Barnett and Lewis (1994) is:An observation (or subset of observations) which appears to be
inconsistent with the remainder of that set of data.In Figure 2, there are ve outlier points labelled V, W, X, Y and Z
which are clearly isolated and inconsistent with the main cluster of
points. The data in the gures in this survey paper is adapted from the
Wine data set (Blake...