Global ETD Search

Return to search

Kernel Machine Methods for Risk Prediction with High Dimensional Data

Understanding the relationship between genomic markers and complex disease could have a profound impact on medicine, but the large number of potential markers can make it hard to differentiate true biological signal from noise and false positive associations. A standard approach for relating genetic markers to complex disease is to test each marker for its association with disease outcome by comparing disease cases to healthy controls. It would be cost-effective to use control groups across studies of many different diseases; however, this can be problematic when the controls are genotyped on a platform different from the one used for cases. Since different platforms genotype different SNPs, imputation is needed to provide full genomic coverage, but introduces differential measurement error. In Chapter 1, we consider the effects of this differential error on association tests. We quantify the inﬂation in Type I Error by comparing two healthy control groups drawn from the same cohort study but genotyped on different platforms, and assess several methods for mitigating this error. Analyzing genomic data one marker at a time can effectively identify associations, but the resulting lists of signiﬁcant SNPs or differentially expressed genes can be hard to interpret. Integrating prior biological knowledge into risk prediction with such data by grouping genomic features into pathways reduces the dimensionality of the problem and could improve models by making them more biologically grounded and interpretable. The kernel machine framework has been proposed to model pathway effects because it allows nonlinear associations between the genes in a pathway and disease risk. In Chapter 2, we propose kernel machine regression under the accelerated failure time model. We derive a pseudo-score statistic for testing and a risk score for prediction using genes in a single pathway. We propose omnibus procedures that alleviate the need to prespecify the kernel and allow the data to drive the complexity of the resulting model. In Chapter 3, we extend methods for risk prediction using a single pathway to methods for risk prediction model using multiple pathways using a multiple kernel learning approach to select important pathways and efﬁciently combine information across pathways.

high dimensional data

Identifer	oai:union.ndltd.org:harvard.edu/oai:dash.harvard.edu:1/9793867
Date	22 October 2012
Creators	Sinnott, Jennifer Anne
Contributors	Cai, Tianxi
Publisher	Harvard University
Source Sets	Harvard University
Language	en_US
Detected Language	English
Type	Thesis or Dissertation
Rights	open

Page generated in 0.0022 seconds

Kernel Machine Methods for Risk Prediction with High Dimensional Data

Description

Links & Downloads

Tags

Additional Fields