Return to search

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

Outlier detection has received significant attention in many applications, such as credit card fraud detection and network intrusion detection. Most of the existing research efforts focus on numerical datasets and cannot be directly applied to categorical sets where there is little sense in ordering the data and calculating distances among data points. Furthermore, a number of the current outlier detection methods require quadratic time with respect to the dataset size and usually need multiple scans of the data; these features are undesirable when the datasets are large and scattered over multiple geographically distributed sites. In this paper, we focus and evaluate, experimentally, a few representative current outlier detection approaches ( one based on entropy and two based on frequent itemsets) that are geared towards categorical sets. In addition, we introduce a simple, scalable and efficient outlier detection algorithm that has the advantage of discovering outliers in categorical datasets by performing a single scan of the dataset. This newly introduced outlier detection algorithm is compared with the existing, and aforementioned outlier detection strategies. The conclusion from this comparison is that the simple outlier detection algorithm that we introduce is more efficient (faster) than the existing strategies, and as effective (accurate) in discovering outliers.

Identiferoai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:honorstheses1990-2015-1655
Date01 January 2007
CreatorsOrtiz, Enrique
PublisherSTARS
Source SetsUniversity of Central Florida
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceHIM 1990-2015

Page generated in 0.0016 seconds