Return to search

Statistical methods to identify differentially methylated regions using illumina methylation arrays

DNA methylation is an epigenetic mechanism that usually occurs at CpG sites in the genome. Both sequencing and array-based techniques are available to detect methylation patterns. Whole-genome bisulfite sequencing is the most comprehensive but cost-prohibitive approach, and microarrays represent an affordable alternative approach. Array-based methods are generally cheaper but assess a specific number of genomic loci, such as Illumina methylation arrays. Differentially methylated regions (DMRs) are genomic regions with specific methylation patterns across multiple CpG sites that associate with a phenotype. Methylation at nearby sites tends to be correlated, therefore it may be more powerful to study sets of sites to detect methylation differences as well as reduce the multiple testing burden, compared to utilizing individual sites. Several statistical approaches exist for identifying DMRs, and a few prior publications compared the performance of several commonly used DMR methods. However, as far as we know, no comprehensive comparisons have been made based on genome-wide simulation studies.

This dissertation provides some comprehensive suggestions for DMR analysis based on genome-wide evaluations of existing DMR tools and presents the development of a novel approach to increase the power to identify DMRs with clinical value in genomic research. The second chapter presents genome-wide null simulations to compare five commonly used array-based DMR methods (Bumphunter, comb-p, DMRcate, mCSEA and coMethDMR) and identifies coMethDMR as the only approach that consistently yields appropriate Type I error control. We suggest that a genome-wide evaluation of false positive (FP) rates is critical for DMR methods. The third chapter develops a novel Principal Component Analysis based DMR method (denoted as DMRPC), which demonstrates its ability to identify DMRs using genome-wide methylation arrays with well-controlled FP rates at the level of 0.05. Compared to coMethDMR, DMRPC is a robust and powerful novel DMR tool that can examine more genomic regions and extract signals from low-correlation regions. The fourth chapter applies the new DMR approach DMRPC in two “real-world” datasets and identifies novel DMRs that are associated with several inflammatory markers.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/48027
Date08 February 2024
CreatorsZheng, Yuanchao
ContributorsLogue, Mark W.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation
RightsAttribution 4.0 International, http://creativecommons.org/licenses/by/4.0/

Page generated in 0.002 seconds