Global ETD Search

Return to search

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

Outlier detection has received significant attention in many applications, such as credit card fraud detection and network intrusion detection. Most of the existing research efforts focus on numerical datasets and cannot be directly applied to categorical sets where there is little sense in ordering the data and calculating distances among data points. Furthermore, a number of the current outlier detection methods require quadratic time with respect to the dataset size and usually need multiple scans of the data; these features are undesirable when the datasets are large and scattered over multiple geographically distributed sites. In this paper, we focus and evaluate, experimentally, a few representative current outlier detection approaches ( one based on entropy and two based on frequent itemsets) that are geared towards categorical sets. In addition, we introduce a simple, scalable and efficient outlier detection algorithm that has the advantage of discovering outliers in categorical datasets by performing a single scan of the dataset. This newly introduced outlier detection algorithm is compared with the existing, and aforementioned outlier detection strategies. The conclusion from this comparison is that the simple outlier detection algorithm that we introduce is more efficient (faster) than the existing strategies, and as effective (accurate) in discovering outliers.

https://stars.library.ucf.edu/honorstheses1990-2015/656

Computer Engineering

Identifer	oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:honorstheses1990-2015-1655
Date	01 January 2007
Creators	Ortiz, Enrique
Publisher	STARS
Source Sets	University of Central Florida
Language	English
Detected Language	English
Type	text
Source	HIM 1990-2015

Page generated in 0.0016 seconds

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

Description

Links & Downloads

Tags

Additional Fields