Global ETD Search

Return to search

An Efficient Implementation of a Robust Clustering Algorithm

Clustering and classification are fundamental problems in statistical and machine learning, with a broad range of applications. A common approach is the Gaussian mixture model, which assumes that each cluster or class arises from a distinct Gaussian distribution. This thesis studies a robust, high-dimensional extension of the Gaussian mixture model that automatically detects outliers and noise, and a computationally efficient implementation thereof.

The contaminated Gaussian distribution is a robust elliptic distribution that allows for automatic detection of ``bad points'', and is used to make robust the usual factor analysis model. In turn, the mixtures of contaminated Gaussian factor analyzers (MCGFA) algorithm allows high-dimesional, robust clustering, classification and detection of bad points. A family of MCGFA models is created through the introduction of different constraints on the covariance structure. A new, efficient implementation of the algorithm is presented, along with an account of its development. The fast implementation permits thorough testing of the MCGFA algorithm, and its performance is compared to two natural competitors: parsimonious Gaussian mixture models (PGMM) and mixtures of modified t factor analyzers (MMtFA). The algorithms are tested systematically on simulated and real data. / Thesis / Master of Science (MSc)

http://hdl.handle.net/11375/20598

computational statistics

mixture models

Identifer	oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/20598
Date	January 2016
Creators	Blostein, Martin
Contributors	McNicholas, Paul D., Mathematics and Statistics
Source Sets	McMaster University
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.0018 seconds

An Efficient Implementation of a Robust Clustering Algorithm

Description

Links & Downloads

Tags

Additional Fields