Global ETD Search

1	Hyperbolic Distributions and Transformations for Clustering Incomplete Data with Extensions to Matrix Variate Normality Pocuca, Nikola January 2023 (has links) Under realistic scenarios, data are often incomplete, asymmetric, or of high-dimensionality. More intricate data structures often render standard approaches infeasible due to methodological or computational limitations. This monograph consists of four contributions each solving a specific problem within model-based clustering. An R package is developed consisting of a three-phase imputation method for both elliptical and hyperbolic parsimonious models. A novel stochastic technique is employed to speed up computations for hyperbolic distributions demonstrating superior performance overall. A hyperbolic transformation model is conceived for clustering asymmetrical data within a heterogeneous context. Finally, for high-dimensionality, a framework is developed for assessing matrix variate normality within three-way datasets. All things considered, this work constitutes a powerful set of tools to deal with the ever-growing complexity of big data / Dissertation / Doctor of Science (PhD) Model-based clustering
2	Subspace Clustering with the Multivariate-t Distribution Pesevski, Angelina January 2017 (has links) Clustering procedures suitable for the analysis of very high-dimensional data are needed for many modern data sets. One model-based clustering approach called high-dimensional data clustering (HDDC) uses a family of Gaussian mixture models to model the sub-populations of the observed data, i.e., to perform cluster analysis. The HDDC approach is based on the idea that high-dimensional data usually exists in lower-dimensional subspaces; as such, the dimension of each subspace, called the intrinsic dimension, can be estimated for each sub-population of the observed data. As a result, each of these Gaussian mixture models can be fitted using only a fraction of the total number of model parameters. This family of models has gained attention due to its superior classification performance compared to other families of mixture models; however, it still suffers from the usual limitations of Gaussian mixture model-based approaches. Herein, a robust analogue of the HDDC approach is proposed. This approach, which extends the HDDC procedure to include the mulitvariate-t distribution, encompasses 28 models that rectify one of the major shortcomings of the HDDC procedure. Our tHDDC procedure is fitted to both simulated and real data sets and is compared to the HDDC procedure using an image reconstruction problem that arose from satellite imagery of Mars' surface. / Thesis / Master of Science (MSc) Model-Based Clustering High-Dimensional
3	Cross-Validation for Model Selection in Model-Based Clustering O'Reilly, Rachel 04 September 2012 (has links) Clustering is a technique used to partition unlabelled data into meaningful groups. This thesis will focus on the area of clustering called model-based clustering, where it is assumed that data arise from a finite number of subpopulations, each of which follows a known statistical distribution. The number of groups and shape of each group is unknown in advance, and thus one of the most challenging aspects of clustering is selecting these features. Cross-validation is a model selection technique which is often used in regression and classification, because it tends to choose models that predict well, and are not over-fit to the data. However, it has rarely been applied in a clustering framework. Herein, cross-validation is applied to select the number of groups and covariance structure within a family of Gaussian mixture models. Results are presented for both real and simulated data. / Ontario Graduate Scholarship Program
4	Evolutionary Algorithms for Model-Based Clustering Kampo, Regina S. January 2021 (has links) Cluster analysis is used to detect underlying group structure in data. Model-based clustering is the process of performing cluster analysis which involves the fitting of finite mixture models. However, parameter estimation in mixture model-based approaches to clustering is notoriously difficult. To this end, this thesis focuses on the development of evolutionary computation as an alternative technique for parameter estimation in mixture models. An evolutionary algorithm is proposed and illustrated on the well-established Gaussian mixture model with missing values. Next, the family of Gaussian parsimonious clustering models is considered, and an evolutionary algorithm is developed to estimate the parameters. Next, an evolutionary algorithm is developed for latent Gaussian mixture models and to facilitate the flexible clustering of high-dimensional data. For all models and families of models considered in this thesis, the proposed algorithms used for model-fitting and parameter estimation are presented and the performance illustrated using real and simulated data sets to assess the clustering ability of all models. This thesis concludes with a discussion and suggestions for future work. / Dissertation / Doctor of Philosophy (PhD) Evolutionary Algorithm Model-based Clustering EM Algorithm
5	Non-Gaussian Mixture Model Averaging for Clustering Zhang, Xu Xuan January 2017 (has links) The Gaussian mixture model has been used for model-based clustering analysis for decades. Most model-based clustering analyses are based on the Gaussian mixture model. Model averaging approaches for Gaussian mixture models are proposed by Wei and McNicholas, based on a family of 14 Gaussian parsimonious clustering models. In this thesis, we use non-Gaussian mixture models, namely the tEigen family, for our averaging approaches. This paper studies fitting in an averaged model from a set of multivariate t-mixture models instead of fitting a best model. / Thesis / Master of Science (MSc)
6	Dimension Reduction and Clustering of High Dimensional Data using a Mixture of Generalized Hyperbolic Distributions Pathmanathan, Thinesh January 2018 (has links) Model-based clustering is a probabilistic approach that views each cluster as a component in an appropriate mixture model. The Gaussian mixture model is one of the most widely used model-based methods. However, this model tends to perform poorly when clustering high-dimensional data due to the over-parametrized solutions that arise in high-dimensional spaces. This work instead considers the approach of combining dimension reduction techniques with clustering via a mixture of generalized hyperbolic distributions. The dimension reduction techniques, principal component analysis and factor analysis along with their extensions were reviewed. Then the aforementioned dimension reduction techniques were individually paired with the mixture of generalized hyperbolic distributions in order to demonstrate the clustering performance achieved under each method using both simulated and real data sets. For a majority of the data sets, the clustering method utilizing principal component analysis exhibited better classi cation results compared to the clustering method based on the extending the factor analysis model. / Thesis / Master of Science (MSc)
7	Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-Distributions Morris, Katherine 21 August 2012 (has links) We introduce a dimension reduction method for model-based clustering obtained from a finite mixture of t-distributions. This approach is based on existing work on reducing dimensionality in the case of finite Gaussian mixtures. The method relies on identifying a reduced subspace of the data by considering how much group means and group covariances vary. This subspace contains linear combinations of the original data, which are ordered by importance via the associated eigenvalues. Observations can be projected onto the subspace and the resulting set of variables captures most of the clustering structure available in the data. The approach is illustrated using simulated and real data. / Paul McNicholas
8	Mixtures of Skew-t Factor Analyzers Murray, Paula 11 1900 (has links) Model-based clustering allows for the identification of subgroups in a data set through the use of finite mixture models. When applied to high-dimensional microarray data, we can discover groups of genes characterized by their gene expression profiles. In this thesis, a mixture of skew-t factor analyzers is introduced for the clustering of high-dimensional data. Notably, we make use of a version of the skew-t distribution which has not previously appeared in mixture-modelling literature. Allowing a constraint on the factor loading matrix leads to two mixtures of skew-t factor analyzers models. These models are implemented using the alternating expectation-conditional maximization algorithm for parameter estimation with an Aitken's acceleration stopping criterion used to determine convergence. The Bayesian information criterion is used for model selection and the performance of each model is assessed using the adjusted Rand index. The models are applied to both real and simulated data, obtaining clustering results which are equivalent or superior to those of established clustering methods.
9	Longitudinal Clustering via Mixtures of Multivariate Power Exponential Distributions Patel, Nidhi January 2016 (has links) A mixture model approach for clustering longitudinal data is introduced. The approach, which is based on mixtures of multivariate power exponential distributions, allows for varying tail-weight and peakedness in data. In the longitudinal setting, this corresponds to more or less concentration around the most central time course in a component. The models utilize a modified Cholesky decomposition of the component scale matrices and the associated maximum likelihood estimators are derived via a generalized expectation-maximization algorithm. / Thesis / Master of Science (MSc) longitudinal data model-based clustering mixture models power exponential distribution
10	An Evolutionary Algorithm for Matrix-Variate Model-Based Clustering Flynn, Thomas J. January 2023 (has links) Model-based clustering is the use of finite mixture models to identify underlying group structures in data. Estimating parameters for mixture models is notoriously difficult, with the expectation-maximization (EM) algorithm being the predominant method. An alternative approach is the evolutionary algorithm (EA) which emulates natural selection on a population of candidate solutions. By leveraging a fitness function and genetic operators like crossover and mutation, EAs offer a distinct way to search the likelihood surface. EAs have been developed for model-based clustering in the multivariate setting; however, there is a growing interest in matrix-variate distributions for three-way data applications. In this context, we propose an EA for finite mixtures of matrix-variate distributions. / Thesis / Master of Science (MSc)

Search results