Full Text

Articial Intelligence Review 22: 85126, 2004.

2004 Kluwer Academic Publishers. Printed in the Netherlands.85A Survey of Outlier Detection MethodologiesVICTORIA J. HODGE & JIM AUSTINDepartment of Computer Science, University of York, York, YO10 5DD UK

(E-mail: fvicky, [email protected])Abstract. Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults,

changes in system behaviour, fraudulent behaviour, human error, instrument error or

simply through natural deviations in populations. Their detection can identify system

faults and fraud before they escalate with potentially catastrophic consequences. It can

identify errors and remove their contaminating eect on the data set and as such to

purify the data for processing. The original outlier detection methods were arbitrary but

now, principled and systematic techniques are used, drawn from the full gamut of

Computer Science and Statistics. In this paper, we introduce a survey of contemporary

techniques for outlier detection. We identify their respective motivations and distinguish

their advantages and disadvantages in a comparative review.Keywords: anomaly, detection, deviation, noise, novelty, outlier, recognition1. IntroductionOutlier detection encompasses aspects of a broad spectrum of techniques. Many techniques employed for detecting outliers are fundamentally identical but with dierent names chosen by the authors. For

example, authors describe their various approaches as outlier detection,

novelty detection, anomaly detection, noise detection, deviation detection or exception mining. In this paper, we have chosen to call the

technique outlier detection although we also use novelty detection where

we feel appropriate but we incorporate approaches from all ve categories named above. Additionally, authors have proposed many denitions for an outlier with seemingly no universally accepted denition.

We will take the denition of Grubbs (1969) and quoted in Barnett and

Lewis (1994).An outlying observation, or outlier, is one that appears to deviate

markedly from other members of the sample in which it occurs.86A further outlier denition from Barnett and Lewis (1994) is:An observation (or subset of observations) which appears to be

inconsistent with the remainder of that set of data.In Figure 2, there are ve outlier points labelled V, W, X, Y and Z

which are clearly isolated and inconsistent with the main cluster of

points. The data in the gures in this survey paper is adapted from the

Wine data set (Blake...

Show less

A Survey of Outlier Detection Methodologies

Content area

Full Text

Suggested sources