Multivariate data is prevalent in modern applications, yet it often presents significant analytical challenges. Factor models can offer an effective tool to address issues associated with large-scale datasets. In this dissertation, we propose two novel Bayesian factors models. These models are designed to effectively reduce the dimensionality of the data, as the number of latent factors is typically much smaller than that of the observation vectors. Therefore, our proposed models can achieve substantial dimension reduction.
Our first model is for spatiotemporal areal data. In this case, the region of interest is divided into subregions, and at each time point, there is one univariate observation per subregion. Our model writes the vector of observations at each time point in a factor model form as the product of a vector of factor loadings and a vector of common factors plus a vector of error. Our model assumes that the common factor evolves through time according to a dynamic linear model. To represent the spatial relationships among subregions, each column of the factor loadings matrix is assigned intrinsic conditional autoregressive (ICAR) priors. Therefore, we call our approach the Dynamic ICAR Spatiotemporal Factor Models (DIFM).
Our second model, Bayesian Clustering Factor Model (BCFM) assumes latent factors and clusters are present in the data. We apply Gaussian mixture models on common factors to discover clusters. For both models, we develop MCMC to explore the posterior distribution of the parameters. To select the number of factors and, in the case of clustering methods, the number of clusters, we develop model selection criteria that utilize the Laplace-Metropolis estimator of the predictive density and BIC with integrated likelihood. / Doctor of Philosophy / Understanding large-scale datasets has emerged as one of the most significant challenges for researchers recently. This is particularly true for datasets that are inherently complex and nontrivial to analyze. In this dissertation, we present two novel classes of Bayesian factor models for two classes of complex datasets. Frequently, the number of factors is much smaller than the number of variables, and therefore factor models can be an effective approach to handle multivariate datasets. First, we develop Dynamic ICAR Spatiotemporal Factor Model (DIFM) for datasets collected on a partition of a spatial domain of interest over time. The DIFM accounts for the spatiotemporal correlation and provides predictions of future trends. Second, we develop Bayesian Clustering Factor Model (BCFM) for multivariate data that cluster in a space of dimension lower than the dimension of the vector of observations. BCFM enables researchers to identify different characteristics of the subgroups, offering valuable insights into their underlying structure.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/119143 |
Date | 28 May 2024 |
Creators | Shin, Hwasoo |
Contributors | Statistics, Ferreira, Marco Antonio Rosa, Tegge, Allison, Franck, Christopher Thomas, Kim, Inyoung |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0021 seconds