Return to search

Semiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Data.

We are living in an era in which many mysteries related to science, technologies and design can be answered by "learning" the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. We define a "set" as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an "element" as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using Bayesian survival kernel models. By connecting kernel machines with semiparametric Bayesian hierarchical models, the proposed unified model frameworks can identify significant elements as well as sets regardless of mis-specifications of distributions or kernels. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems. / PHD / We are living in an era in which many mysteries related to science, technologies and design can be answered by “learning” the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. For example, for a group of 30 patients in a medical study, values for an immense number of variables like gender, age, height, weight, and blood pressure of each patient are recorded. High-dimensional means the number of variables (i.e. p) could be very large (e.g. p > 500), while the number of subjects or the sample size (i.e. n) is small (n = 30). We define a “set” as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an “element” as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using different proposed statistical models. The proposed models can incorporate prior knowledge to boost the model performance. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/95040
Date01 May 2018
CreatorsZhang, Lin
ContributorsStatistics, Kim, Inyoung, House, Leanna L., Hong, Yili, Du, Pang
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0016 seconds