Return to search

Estimating p-values for outlier detection

Outlier detection is useful in a vast numbers of different domains, wherever there is data and a need for analysis. The research area related to outlier detection is large and the number of available approaches is constantly growing. Most of the approaches produce a binary result: either outlier or not. In this work approaches that are able to detect outliers by producing a p-value estimate are investigated. Approaches that estimate p-values are interesting since it allows their results to easily be compared against each other, followed over time, or be used with a variable threshold. Four approaches are subjected to a variety of tests to attempt to measure their suitability when the data is distributed in a number of ways. The first approach, the R2S, is developed at Halmstad University. Based on finding the mid-point of the data. The second approach is based on one-class support vector machines (OCSVM). The third and fourth approaches are both based on conformal anomaly detection (CAD), but using different nonconformity measures (NCM). The Mahalanobis distance to the mean and a variation of k-NN are used as NCMs. The R2S and the CAD Mahalanobis are both good at estimating p-values from data generated by unimodal and symmetrical distributions. The CAD k-NN is good at estimating p-values when the data is generated by a bimodal or extremely asymmetric distribution. The OCSVM does not excel in any scenario, but produces good average results in most of the tests. The approaches are also subjected to real data, where they all produce comparable results.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hh-25662
Date January 2014
CreatorsNorrman, Henrik
PublisherHögskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0106 seconds