Return to search

Dimension reduction methods for nonlinear association analysis with applications to omics data

With advances in high-throughput techniques, the availability of large-scale omics data has revolutionized the fields of medicine and biology, and has offered a better understanding of the underlying biological mechanisms. However, the high-dimensionality and the unknown association structure between different data types make statistical integration analyses challenging. In this dissertation, we develop three dimensionality reduction methods to detect nonlinear association structure using omics data. First, we propose a method for variable selection in a nonparametric additive quantile regression framework. We enforce a network regularization to incorporate information encoded by known networks. To account for nonlinear associations, we approximate the additive functional effect of each predictor with the expansion of a B-spline basis. We implement the group Lasso penalty to achieve sparsity. We define the network-constrained penalty by regulating the difference between the effect functions of any two linked genes (predictors) in the network. Simulation studies show that our proposed method performs well in identifying truly associated genes with fewer falsely associated genes than alternative approaches. Second, we develop a canonical correlation analysis (CCA)-based method, canonical distance correlation analysis (CDCA), and leverage the distance correlation to capture the overall association between two sets of variables. The CDCA allows untangling linear and nonlinear dependence structures. Third, we develop the sparse CDCA (sCDCA) method to achieve sparsity and improve result interpretability by adding penalties on the loadings from the CDCA. The sCDCA method can be applied to data with large dimensionality and small sample size. We develop iterative majorization-minimization-based coordinate descent algorithms to compute the loadings in the CDCA and sCDCA methods. Simulation studies show that the proposed CDCA and sCDCA approaches have better performance than classical CCA and sparse CCA (sCCA) in nonlinear settings and have similar performance in linear association settings. We apply the proposed methods to the Framingham Heart Study (FHS) to identify body mass index associated genes, the association structure between metabolic disorders and metabolite profiles, and a subset of metabolites and their associated type 2 diabetes (T2D)-related genes. / 2023-11-05T00:00:00Z

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/43305
Date06 November 2021
CreatorsWu, Peitao
ContributorsLiu, Ching-Ti
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0025 seconds