Return to search

Bayesian Modeling and Variable Selection for Complex Data

As we routinely encounter high-throughput datasets in complex biological and environment research, developing novel models and methods
for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and
variable selection for high-dimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to
mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in
high dimensions. This has motivated continuous shrinkage priors, resembling the two-component priors facilitating computation and
interpretability. While such priors are widely used for estimating high-dimensional sparse vectors, selecting a subset of variables remains a
daunting task. b) Spatial/spatial-temporal data sets with complex structures are nowadays commonly encountered in various scientific research
fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important
spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to
researchers. Self-excitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of
events nearby in time and space, can be found in many natural/social events. Research on modeling data with self-excitation feature has
increasingly drawn interests recently. However, existing literature on self-exciting models with inclusion of high-dimensional spatial
covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the
computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical
divide-conquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the
distributing computing algorithm to handle a large number of sub-groups poses a serious bottleneck. In chapter 2 of this dissertation, we
propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method
attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors,
but can be used along with any shrinkage prior. Theoretical properties for near-collinear design matrices are investigated and the method is
shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival
due to lymphoma. In Chapter 3 of this dissertation, we propose a new self-exciting model that allows the inclusion of spatial covariates. We
develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data
examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4,
we focus on a hierarchical Gaussian Process regression model for ultra-high dimensional spatial datasets. By evaluating the latent Gaussian
process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows
information across multiple sub-groups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are
studied through simulation examples. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the
degree of Doctor of Philosophy. / Fall Semester 2017. / October 23, 2017. / Includes bibliographical references. / Debdeep Pati, Professor Co-Directing Dissertation; Fred Huffer, Professor Co-Directing Dissertation;
Alec Kercheval, University Representative; Debajyoti Sinha, Committee Member; Jonathan Bradley, Committee Member.

Identiferoai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_604984
ContributorsLi, Hanning (author), Pati, Debdeep (professor co-directing dissertation), Huffer, Fred W. (Fred William) (professor co-directing dissertation), Kercheval, Alec N. (university representative), Sinha, Debajyoti (committee member), Bradley, Jonathan R. (committee member), Florida State University (degree granting institution), College of Arts and Sciences (degree granting college), Department of Statistics (degree granting departmentdgg)
PublisherFlorida State University
Source SetsFlorida State University
LanguageEnglish, English
Detected LanguageEnglish
TypeText, text, doctoral thesis
Format1 online resource (91 pages), computer, application/pdf

Page generated in 0.0025 seconds