Model-based clustering is a probabilistic approach that views each cluster as a component
in an appropriate mixture model. The Gaussian mixture model is one of the
most widely used model-based methods. However, this model tends to perform poorly
when clustering high-dimensional data due to the over-parametrized solutions that
arise in high-dimensional spaces. This work instead considers the approach of combining
dimension reduction techniques with clustering via a mixture of generalized
hyperbolic distributions. The dimension reduction techniques, principal component
analysis and factor analysis along with their extensions were reviewed. Then the aforementioned
dimension reduction techniques were individually paired with the mixture
of generalized hyperbolic distributions in order to demonstrate the clustering performance
achieved under each method using both simulated and real data sets. For a
majority of the data sets, the clustering method utilizing principal component analysis
exhibited better classi cation results compared to the clustering method based
on the extending the factor analysis model. / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/22758 |
Date | January 2018 |
Creators | Pathmanathan, Thinesh |
Contributors | McNicholas, Sharon, Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0018 seconds