Return to search

A comparative study of correlational outlier detection metrics

The present investigation was a Monte Carlo experiment designed to evaluate the performance of several metrics in spotting correlational outliers. Specifically, the metrics that were compared were the Mahalanobis D², Bacon MLD, Carrig D, MCD, Robust PCLOW and Robust PCHIGH. This was the first comparative simulation study to include robust PCLOW and robust PCHIGH. The Mahalanobis D², MCD, Robust PCLOW and Robust PCHIGH were each applied using an approximate statistical criterion. The Carrig D and Bacon MLD were applied using a "natural drop" approach that separated scores on the metric into two groups: outlying and non-outlying. The "natural drop" utilizes a k-means algorithm from cluster analysis to separate the scores into the two groups. Both majority and contaminant observations were generated from multivariate normal distributions based on factor-analytic models. Experimental factors included majority versus contaminant communality level, majority-contaminant factor models scenario, number of variables, sample size and fraction of outliers. Results indicated that the "natural drop" method of application for the Carrig D and Bacon MLD leads to intolerably high false-alarm rates. Overall, PCLOW clearly outperformed PCHIGH. Suprisingly, PCLOW did not distinguish itself from MCD in terms of performance as expected in certain experimental conditions. The conditions in this study were limited. Future comparative studies of the metrics could include conditions of non-normality and hybrid types of outliers (i.e. outliers that are both mean shift and correlational). Despite its poor performance in this study, I theorize that robust PCHIGH could have an advantage over MCD in spotting certain kinds of mean-shift outliers. Also, research into the distributional properties of the Carrig D is warranted. / text

Identiferoai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/18106
Date01 October 2012
CreatorsRitter, Paul Muse, 1961-
Source SetsUniversity of Texas
LanguageEnglish
Detected LanguageEnglish
Formatelectronic
RightsCopyright is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.

Page generated in 0.0022 seconds