Cluster analysis is commonly described as the classification of unlabeled observations into groups such that they are more similar to one another than to observations in other groups. Model-based clustering assumes that the data arise from a statistical (mixture) model and typically a group of many models are fit to the data, from which the `best' model is selected by a model selection criterion (often the BIC in mixture model applications). This chosen model is then the only model that is used for making inferences on the data. Although this is common practice, proceeding in this way ignores a large component of model selection uncertainty, especially for situations where the difference between the model selection criterion for two competing models is relatively insignificant. For this reason, recent interest has been placed on selecting a subset of models that are close to the selected best model and using a weighted averaging approach to incorporate information from multiple models in this set. Model averaging is not a novel approach, yet its presence in a clustering framework is minimal. Here, we use Occam's window to select a subset of models eligible for two types of averaging techniques: averaging a posteriori probabilities, and direct averaging of model parameters. The efficacy of these model-based averaging approaches is demonstrated for a family of generalized hyperbolic mixture models using real and simulated data. / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/22147 |
Date | 11 1900 |
Creators | Ricciuti, Sarah |
Contributors | McNicholas, Paul, Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0023 seconds