Return to search

Scalable Estimation and Testing for Complex, High-Dimensional Data

With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, etc. These data provide a rich source of information on disease development, cell evolvement, engineering systems, and many other scientific phenomena. To achieve a clearer understanding of the underlying mechanism, one needs a fast and reliable analytical approach to extract useful information from the wealth of data. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex data, powerful testing of functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a wavelet-based approximate Bayesian computation approach that is likelihood-free and computationally scalable. This approach will be applied to two applications: estimating mutation rates of a generalized birth-death process based on fluctuation experimental data and estimating the parameters of targets based on foliage echoes. The second part focuses on functional testing. We consider using multiple testing in basis-space via p-value guided compression. Our theoretical results demonstrate that, under regularity conditions, the Westfall-Young randomization test in basis space achieves strong control of family-wise error rate and asymptotic optimality. Furthermore, appropriate compression in basis space leads to improved power as compared to point-wise testing in data domain or basis-space testing without compression. The effectiveness of the proposed procedure is demonstrated through two applications: the detection of regions of spectral curves associated with pre-cancer using 1-dimensional fluorescence spectroscopy data and the detection of disease-related regions using 3-dimensional Alzheimer's Disease neuroimaging data. The third part focuses on analyzing data measured on the cortical surfaces of monkeys' brains during their early development, and subjects are measured on misaligned time markers. In this analysis, we examine the asymmetric patterns and increase/decrease trend in the monkeys' brains across time. / Doctor of Philosophy / With modern high-throughput technologies, scientists can now collect high-dimensional data of various forms, including brain images, medical spectrum curves, engineering signals, and biological measurements. These data provide a rich source of information on disease development, engineering systems, and many other scientific phenomena. The goal of this dissertation is to develop novel methods that enable scalable estimation, testing, and analysis of complex, high-dimensional data. It contains three parts: parameter estimation based on complex biological and engineering data, powerful testing of high-dimensional functional data, and the analysis of functional data supported on manifolds. The first part focuses on a family of parameter estimation problems in which the relationship between data and the underlying parameters cannot be explicitly specified using a likelihood function. We introduce a computation-based statistical approach that achieves efficient parameter estimation scalable to high-dimensional functional data. The second part focuses on developing a powerful testing method for functional data that can be used to detect important regions. We will show nice properties of our approach. The effectiveness of this testing approach will be demonstrated using two applications: the detection of regions of the spectrum that are related to pre-cancer using fluorescence spectroscopy data and the detection of disease-related regions using brain image data. The third part focuses on analyzing brain cortical thickness data, measured on the cortical surfaces of monkeys’ brains during early development. Subjects are measured on misaligned time-markers. By using functional data estimation and testing approach, we are able to: (1) identify asymmetric regions between their right and left brains across time, and (2) identify spatial regions on the cortical surface that reflect increase or decrease in cortical measurements over time.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/93223
Date22 August 2019
CreatorsLu, Ruijin
ContributorsStatistics, Zhu, Hongxiao, Wu, Xiaowei, Deng, Xinwei, Kim, Inyoung
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0027 seconds