We live in the era of textit{Big Data} with significantly richer computational resources than the last two decades. The concurrence of computation resources and a large volume of data has boosted researchers' desire for developing feasible Markov Chain Monte Carlo (MCMC) algorithms for large parameter spaces. Dirichlet Process Mixture Models (DPMMs) have become a Bayesian mainstay for modeling heterogeneous structures, namely clusters, especially when the quantity of clusters is not known with the established MCMC methods. As opposed to many ad-hoc clustering methods, using Dirichlet Processes (DPs) in models provide a flexible and probabilistic approach for automatically estimating both cluster structure and quantity. While DPs are not fully parameterized, they depend on both a base measure and a concentration parameter that can heavily impact inferences.
Determining the concentration parameter is critical and essential, since it adjusts the a-priori cluster expectation, but typical approaches for specifying this parameter are rather cavalier. In this work, we propose a new method for automatically and adaptively determining this parameter, which directly calibrates distances between clusters through an explicit link function within the DP. Furthermore, we extend our method to mixture models with Nested Dirichlet Processes (NDPs) that cluster the multilevel data and depend on the specification of a vector of concentration parameters. In this work, we detail how to incorporate our method in Markov chain Monte Carlo algorithms, and illustrate our findings through a series of comparative simulation studies and applications. / Ph. D. / We live in the era of <i>Big Data</i> with significantly richer computational resources than the last two decades. The concurrence of computational resources and a large volume of data has boosted researcher’s desire to develop the efficient Markov Chain Monte Carlo (MCMC) algorithms for models such as a Dirichlet process mixture model. The Dirichlet process mixture model has become more popular for clustering analyses because it provides a flexible and generative model for automatically defining both cluster structure and quantity. However, a clustering solution inferred by the Dirichlet process mixture model is impacted by the hyperparameters called a base measure and a concentration parameter.
Determining the concentration parameter is critical and essential, since it adjusts the apriori cluster expectation, but typical approaches for specifying this parameter are rather cavalier. In this work, we propose a new method for automatically and adaptively determining this parameter, which directly calibrates distances between clusters. Furthermore, we extend our method to mixture models with Nested Dirichlet Processes (NDPs) that cluster the multilevel data and depend on the specification of a vector of concentration parameters. In this work, we have simulation studies to show the performance of the developed methods and applications such as modeling the timeline for building construction data and clustering the U.S median household income data.
This work has contributions: 1) the developed methods in this work are straightforward to incorporate with any type of Monte Carlo Markov Chain algorithms, 2) methods calibrate with the probability distance between clusters and maximize the information based on the observations in defined clusters when estimating the concentration parameter, and 3) the methods can be extended to any type of the extension of Dirichlet processes, for instance, hierarchical Dirichlet processes or dependent Dirichlet processes.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/74970 |
Date | 08 February 2017 |
Creators | Song, Yuhyun |
Contributors | Statistics, Leman, Scotland C., Terrell, George R., House, Leanna L., Kim, Inyoung |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0243 seconds