Global ETD Search

Return to search

Outlier Detection in Gaussian Mixture Models

Unsupervised classification is a problem often plagued by outliers, yet there is a paucity of work on handling outliers in unsupervised classification. Mixtures of Gaussian distributions are a popular choice in model-based clustering. A single outlier can affect parameters estimation and, as such, must be accounted for. This issue is further complicated by the presence of multiple outliers. Predicting the proportion of outliers correctly is paramount as it minimizes misclassification error. It is proved that, for a finite Gaussian mixture model, the log-likelihoods of the subset models are distributed according to a mixture of beta-type distributions. This relationship is leveraged in two ways. First, an algorithm is proposed that predicts the proportion of outliers by measuring the adherence of a set of subset log-likelihoods to a beta-type mixture reference distribution. This algorithm removes the least likely points, which are deemed outliers, until model assumptions are met. Second, a hypothesis test is developed, which, at a chosen significance level, can test whether a dataset contains a single outlier. / Thesis / Master of Science (MSc)

http://hdl.handle.net/11375/25930

Identifer	oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/25930
Date	January 2020
Creators	Clark, Katharine
Contributors	McNicholas, Paul, Statistics
Source Sets	McMaster University
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.002 seconds

Outlier Detection in Gaussian Mixture Models

Description

Links & Downloads

Tags

Additional Fields