Global ETD Search

Return to search

Cross-Validation for Model Selection in Model-Based Clustering

Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features.

Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. / Ontario Graduate Scholarship Program

http://hdl.handle.net/10214/3911

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OGU.10214/3911
Date	04 September 2012
Creators	O'Reilly, Rachel
Contributors	Paul, McNicholas
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.0018 seconds

Cross-Validation for Model Selection in Model-Based Clustering

Description

Links & Downloads

Tags

Additional Fields