Spelling suggestions: "subject:"modelbased clustering"" "subject:"model.based clustering""
1 |
Hyperbolic Distributions and Transformations for Clustering Incomplete Data with Extensions to Matrix Variate NormalityPocuca, Nikola January 2023 (has links)
Under realistic scenarios, data are often incomplete, asymmetric, or of high-dimensionality.
More intricate data structures often render standard approaches infeasible due to
methodological or computational limitations. This monograph consists of four contributions each solving a specific problem within model-based clustering. An R package
is developed consisting of a three-phase imputation method for both elliptical and hyperbolic parsimonious models. A novel stochastic technique is employed to speed up
computations for hyperbolic distributions demonstrating superior performance overall. A hyperbolic transformation model is conceived for clustering asymmetrical data
within a heterogeneous context. Finally, for high-dimensionality, a framework is developed for assessing matrix variate normality within three-way datasets. All things
considered, this work constitutes a powerful set of tools to deal with the ever-growing
complexity of big data / Dissertation / Doctor of Science (PhD)
|
2 |
Subspace Clustering with the Multivariate-t DistributionPesevski, Angelina January 2017 (has links)
Clustering procedures suitable for the analysis of very high-dimensional data are needed for many modern data sets. One model-based clustering approach called high-dimensional data clustering (HDDC) uses a family of Gaussian mixture models to model the sub-populations of the observed data, i.e., to perform cluster analysis. The HDDC approach is based on the idea that high-dimensional data usually exists in lower-dimensional subspaces; as such, the dimension of each subspace, called the intrinsic dimension, can be estimated for each sub-population of the observed data. As a result, each of these Gaussian mixture models can be fitted using only a fraction of the total number of model parameters. This family of models has gained attention due to its superior classification performance compared to other families of mixture models; however, it still suffers from the usual limitations of Gaussian mixture model-based approaches. Herein, a robust analogue of the HDDC approach is proposed. This approach, which extends the HDDC procedure to include the mulitvariate-t distribution, encompasses 28 models that rectify one of the major shortcomings of the HDDC procedure. Our tHDDC procedure is fitted to both simulated and real data sets and is compared to the HDDC procedure using an image reconstruction problem that arose from satellite imagery of Mars' surface. / Thesis / Master of Science (MSc)
|
3 |
Cross-Validation for Model Selection in Model-Based ClusteringO'Reilly, Rachel 04 September 2012 (has links)
Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features.
Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. / Ontario Graduate Scholarship Program
|
4 |
Evolutionary Algorithms for Model-Based ClusteringKampo, Regina S. January 2021 (has links)
Cluster analysis is used to detect underlying group structure in data. Model-based
clustering is the process of performing cluster analysis which involves the fitting of
finite mixture models. However, parameter estimation in mixture model-based approaches
to clustering is notoriously difficult. To this end, this thesis focuses on the
development of evolutionary computation as an alternative technique for parameter
estimation in mixture models. An evolutionary algorithm is proposed and illustrated
on the well-established Gaussian mixture model with missing values. Next, the family
of Gaussian parsimonious clustering models is considered, and an evolutionary
algorithm is developed to estimate the parameters. Next, an evolutionary algorithm
is developed for latent Gaussian mixture models and to facilitate the flexible clustering
of high-dimensional data. For all models and families of models considered in
this thesis, the proposed algorithms used for model-fitting and parameter estimation
are presented and the performance illustrated using real and simulated data sets to
assess the clustering ability of all models. This thesis concludes with a discussion
and suggestions for future work. / Dissertation / Doctor of Philosophy (PhD)
|
5 |
Non-Gaussian Mixture Model Averaging for ClusteringZhang, Xu Xuan January 2017 (has links)
The Gaussian mixture model has been used for model-based clustering analysis for
decades. Most model-based clustering analyses are based on the Gaussian mixture
model. Model averaging approaches for Gaussian mixture models are proposed by
Wei and McNicholas, based on a family of 14 Gaussian parsimonious clustering
models. In this thesis, we use non-Gaussian mixture
models, namely the tEigen family, for our averaging approaches. This paper studies
fitting in an averaged model from a set of multivariate t-mixture models instead of
fitting a best model. / Thesis / Master of Science (MSc)
|
6 |
Dimension Reduction and Clustering of High Dimensional Data using a Mixture of Generalized Hyperbolic DistributionsPathmanathan, Thinesh January 2018 (has links)
Model-based clustering is a probabilistic approach that views each cluster as a component
in an appropriate mixture model. The Gaussian mixture model is one of the
most widely used model-based methods. However, this model tends to perform poorly
when clustering high-dimensional data due to the over-parametrized solutions that
arise in high-dimensional spaces. This work instead considers the approach of combining
dimension reduction techniques with clustering via a mixture of generalized
hyperbolic distributions. The dimension reduction techniques, principal component
analysis and factor analysis along with their extensions were reviewed. Then the aforementioned
dimension reduction techniques were individually paired with the mixture
of generalized hyperbolic distributions in order to demonstrate the clustering performance
achieved under each method using both simulated and real data sets. For a
majority of the data sets, the clustering method utilizing principal component analysis
exhibited better classi cation results compared to the clustering method based
on the extending the factor analysis model. / Thesis / Master of Science (MSc)
|
7 |
Multivariate longitudinal data clustering with a copula kernel mixture modelZhang, Xi January 2024 (has links)
Many common clustering methods cannot be used for clustering multivariate longitudinal data when the covariance of random variables is a function of the time points. For this reason, a copula kernel mixture model (CKMM) is proposed for clustering such data. The CKMM is a finite mixture model that decomposes each mixture component’s joint density function into a copula and marginal distribution functions, where a Gaussian copula is used for its mathematical traceability. This thesis considers three scenarios: first, the CKMM is developed for balanced multivariate longitudinal data with known eigenfunctions; second, the CKMM is used to fit unbalanced data where trajectories are aligned on the time axis, and eigenfunctions are unknown; and lastly, a dynamic CKMM (DCKMM) is applied to unbalanced data where trajectories are misaligned, and eigenfunctions are unknown. Expectation-maximization type algorithms are used for parameter estimation. The performance of CKMM is demonstrated on both simulated and real data. / Thesis / Candidate in Philosophy
|
8 |
Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-DistributionsMorris, Katherine 21 August 2012 (has links)
We introduce a dimension reduction method for model-based clustering obtained from a finite mixture of t-distributions. This approach is based on existing work on reducing dimensionality in the case of finite Gaussian mixtures. The method relies on identifying a reduced subspace of the data by considering how much group means and group covariances vary. This subspace contains linear combinations of the original data, which are ordered by importance via the associated eigenvalues. Observations can be projected onto the subspace and the resulting set of variables captures most of the clustering structure available in the data. The approach is illustrated using simulated and real data. / Paul McNicholas
|
9 |
Mixtures of Skew-t Factor AnalyzersMurray, Paula 11 1900 (has links)
Model-based clustering allows for the identification of subgroups in a data set through the use of finite mixture models. When applied to high-dimensional microarray data, we can discover groups of genes characterized by their gene expression profiles. In this thesis, a mixture of skew-t factor analyzers is introduced for the clustering of high-dimensional data. Notably, we make use of a version of the skew-t distribution which has not previously appeared in mixture-modelling literature. Allowing a constraint on the factor loading matrix leads to two mixtures of skew-t factor analyzers models. These models are implemented using the alternating expectation-conditional maximization algorithm for parameter estimation with an Aitken's acceleration stopping criterion used to determine convergence. The Bayesian information criterion is used for model selection and the performance of each model is assessed using the adjusted Rand index. The models are applied to both real and simulated data, obtaining clustering results which are equivalent or superior to those of established clustering methods.
|
10 |
Longitudinal Clustering via Mixtures of Multivariate Power Exponential DistributionsPatel, Nidhi January 2016 (has links)
A mixture model approach for clustering longitudinal data is introduced. The approach, which is based on mixtures of multivariate power exponential distributions, allows for varying tail-weight and peakedness in data. In the longitudinal setting, this corresponds to more or less concentration around the most central time course in a component. The models utilize a modified Cholesky decomposition of the component scale matrices and the associated maximum likelihood estimators are derived via a generalized expectation-maximization algorithm. / Thesis / Master of Science (MSc)
|
Page generated in 0.1334 seconds