Content area
Full text
Ann Oper Res (2009) 168: 151168 DOI 10.1007/s10479-008-0371-9
Cluster-based outlier detection
Lian Duan Lida Xu Ying Liu Jun Lee
Published online: 12 June 2008 Springer Science+Business Media, LLC 2008
Abstract Outlier detection has important applications in the eld of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new denition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978986, 2007) which is capable of nding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93104, 2000) to single points.
Keywords Outlier detection Cluster-based outlier LDBSCAN Local outlier factor
L. Duan ( )
Management Sciences Department, University of Iowa, Iowa City, IA, USA e-mail: [email protected]
L. XuCollege of Economics and Management, Beijing Jiaotong University, Beijing 100044, China e-mail: mailto:[email protected]
Web End [email protected]
L. XuDepartment of Information Technology & Decision Science, Old Dominion University, Norfolk, VA 23529, USA
Y. LiuResearch Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, Chinae-mail: mailto:[email protected]
Web End [email protected]
J. LeeChina Science and Technology Network, Chinese Academy of Sciences, Beijing, China e-mail: mailto:[email protected]
Web End [email protected]
152 Ann Oper Res (2009) 168: 151168
1 Introduction
For many KDD applications, such as detecting criminal activities in e-business environment, nding the rare instances or the outliers can be more interesting than nding the common patterns. Finding such exceptions and outliers has received as much attention in the KDD community as some other topics have. An outlier in a dataset is dened informally as an observation that is considerably different from the remainders as if it is generated...