Cluster analysis identifies homogeneous groups that are relevant within a population. In model-based clustering, group membership is estimated using a parametric finite mixture model, commonly the mathematically tractable Gaussian mixture model. One-way clustering methods can be restrictive in cases where there are suspected relationships between the variables in each component, leading to the idea of biclustering, which refers to clustering both observations and variables simultaneously. When the relationships between the variables are known, biclustering becomes one-way supervised. To this end, this thesis focuses on a novel one-way supervised biclustering family based on the Gaussian mixture model. In cases where biclustering may be overestimating the number of components in the data, a model averaging technique utilizing Occam's window is applied to produce better clustering results. Automatic outlier detection is introduced into the biclustering family using mixtures of contaminated Gaussian mixture models. Algorithms for model-fitting and parameter estimation are presented for the techniques described in this thesis, and simulation and real data studies are used to assess their performance. / Thesis / Doctor of Philosophy (PhD)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/21065 |
Date | January 2017 |
Creators | Wong, Monica |
Contributors | McNicholas, Paul, Mathematics and Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.002 seconds