Return to search

Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data

Clustering helps in understanding the natural grouping and internal structure of data. Model-based clustering considers each cluster as a component in a mixture model. As the data dimensionality and complexity increase, model-based clustering tends to over-parametrize results. Thus, it is important to select a subset of critical variables instead of using all the variables for clustering. This study considers two variable selection methods for model-based clustering on real world high-dimensional data; variable selection for clustering and classification (VSCC) and variable selection for model-based clustering (clustvarsel). For simplicity, Gaussian mixture models were applied. Three criteria are used to compare the clustering accuracy and efficiency, which are the adjusted rand index (ARI), mis-clustering error, and performance time (in seconds). / Thesis / Master of Science (MSc)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/27385
Date January 2022
CreatorsXu, Jini
ContributorsMcNicholas, Sharon, Jeganathan, Pratheepa, Mathematics and Statistics
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds