Global ETD Search

Return to search

Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data

Clustering helps in understanding the natural grouping and internal structure of data. Model-based clustering considers each cluster as a component in a mixture model. As the data dimensionality and complexity increase, model-based clustering tends to over-parametrize results. Thus, it is important to select a subset of critical variables instead of using all the variables for clustering. This study considers two variable selection methods for model-based clustering on real world high-dimensional data; variable selection for clustering and classification (VSCC) and variable selection for model-based clustering (clustvarsel). For simplicity, Gaussian mixture models were applied. Three criteria are used to compare the clustering accuracy and efficiency, which are the adjusted rand index (ARI), mis-clustering error, and performance time (in seconds). / Thesis / Master of Science (MSc)

http://hdl.handle.net/11375/27385

Clustering

Statistics

Identifer	oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/27385
Date	January 2022
Creators	Xu, Jini
Contributors	McNicholas, Sharon, Jeganathan, Pratheepa, Mathematics and Statistics
Source Sets	McMaster University
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.0022 seconds

Variable Selection Methods for Model-based Clustering and Application to High-dimensional Data

Description

Links & Downloads

Tags

Additional Fields