Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction almost fifty years ago. Families of mixture models are said to arise when the component parameters, usually the component covariance matrices, are decomposed and a number of constraints are imposed. Within the family setting, it is necessary to choose the member of the family --- i.e., the appropriate covariance structure --- in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for this model selection process, and the expectation-maximization (EM) algorithm has been predominantly used for parameter estimation.
We deviate from the EM-BIC rubric, using variational Bayes approximations for parameter estimation and the deviance information criterion (DIC) for model selection. The variational Bayes approach alleviates some of the computational complexities associated with the EM algorithm. We use this approach on the most famous family of Gaussian mixture models known as Gaussian parsimonious clustering models (GPCM). These models have an eigen-decomposed covariance structure.
Cluster-weighted modelling (CWM) is another flexible statistical framework for modelling local relationships in heterogeneous populations on the basis of weighted combinations of local models. In particular, we extend cluster-weighted models to include an underlying latent factor structure of the independent variable, resulting in a novel family of models known as parsimonious cluster-weighted factor analyzers. The EM-BIC rubric is utilized for parameter estimation and model selection.
Some work on a mixture of multivariate t-distributions is also presented, with a linear model for the mean and a modified Cholesky-decomposed covariance structure leading to a novel family of mixture models. In addition to model-based clustering, these models are also used for model-based classification, i.e., semi-supervised clustering. Parameters are estimated using the EM algorithm and another approach to model selection other than the BIC is also considered. / NSERC PGS-D
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OGU.10214/3876 |
Date | 24 August 2012 |
Creators | Dang, Sanjeena |
Contributors | McNicholas, Paul D. |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0022 seconds