Return to search

Statistical Methods for Epigenetic Data

DNA methylation plays a crucial role in human health, especially cancer. Traditional DNA methylation analysis aims to identify CpGs/genes with differential methylation (DM) between experimental groups. Differential variability (DV) was recently observed that contributes to cancer heterogeneity and was also shown to be essential in detecting early DNA methylation alterations, notably epigenetic field defects. Moreover, studies have demonstrated that environmental factors may modify the effect of DNA methylation on health outcomes, or vice versa. Therefore, this dissertation seeks to develop new statistical methods for epigenetic data focusing on DV and interactions when efficient analytical tools are lacking. First, as neighboring CpG sites are usually highly correlated, we introduced a new method to detect differentially methylated regions (DMRs) that uses combined DM and DV signals between diseased and non-diseased groups. Next, using both DM and DV signals, we considered the problem of identifying epigenetic field defects, when CpG-site-level DM and DV signals are minimal and hard to be detected by existing methods. We proposed a weighted epigenetic distance-based method that accumulates CpG-site-level DM and DV signals in a gene. Here DV signals were captured by a pseudo-data matrix constructed using centered quadratic methylation measures. CpG-site-level association signal annotations were introduced as weights in distance calculations to up-weight signal CpGs and down-weight noise CpGs to further boost the study power. Lastly, we extended the weighted epigenetic distance-based method to incorporate DNA methylation by environment interactions in the detection of overall association between DNA methylation and health outcomes. A pseudo-data matrix was constructed with cross-product terms between DNA methylation and environmental factors that is able to capture their interactions. The superior performance of the proposed methods were shown through intensive simulation studies and real data applications to multiple DNA methylation data.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-9z87-ts07
Date January 2019
CreatorsWang, Ya
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0026 seconds