Return to search

Mixture Model Averaging for Clustering

Model-based clustering is based on a finite mixture of distributions, where each mixture component corresponds to a different group, cluster, subpopulation, or part thereof. Gaussian mixture distributions are most often used. Criteria commonly used in choosing the number of components in a finite mixture model include the Akaike information criterion, Bayesian information criterion, and the integrated completed likelihood. The best model is taken to be the one with highest (or lowest) value of a given criterion. This approach is not reasonable because it is practically impossible to decide what to do when the difference between the best values of two models under such a criterion is ‘small’. Furthermore, it is not clear how such values should be calibrated in different situations with respect to sample size and random variables in the model, nor does it take into account the magnitude of the likelihood. It is, therefore, worthwhile considering a model-averaging approach. We consider an averaging of the top M mixture models and consider applications in clustering and classification. In the course of model averaging, the top M models often have different numbers of mixture components. Therefore, we propose a method of merging Gaussian mixture components in order to get the same number of clusters for the top M models. The idea is to list all the combinations of components for merging, and then choose the combination corresponding to the biggest adjusted Rand index (ARI) with the ‘reference model’. A weight is defined to quantify the importance of each model. The effectiveness of mixture model averaging for clustering is proved by simulated data and real data under the pgmm package, where the ARI from mixture model averaging for clustering are greater than the one of corresponding best model. The attractive feature of mixture model averaging is it’s computationally efficiency; it only uses the conditional membership probabilities. Herein, Gaussian mixture models are used but the approach could be applied effectively without modification to other mixture models. / Paul McNicholas

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OGU.10214/3538
Date30 April 2012
CreatorsWei, Yuhong
ContributorsMcNicholas, Paul
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.002 seconds