Return to search

Methods for DNA Methylation Sequencing Analysis and their Application on Cancer Data

The fundamental subject of this thesis is the development of tools for the analysis of DNA methylation data as well as their application on bisulfite sequencing data comprising a large number of samples. DNA methylation is one of the major epigenetic modifications. It affects the cytosines of the DNA and is essential for the normal development of cells and tissues. Unusual alterations are associated with a variety of diseases and, specially, in cancergeneous tissues global changes in the DNA methylation level have been detected. To sequence DNA methylation on single nucleotide resolution, the sequences are treated with sodium bisulfite before sequencing, whereby unmethylated cytosines are represented as thymines. Thus, specialized techniques are required to process and analyze these kind of data.
Here, the bisulfite analysis toolkit BAT is introduced, that is designed to facilitate an quick analysis of bisulfite treated DNA methylation sequencing data. It covers all steps of processing raw sequencing data up to calling of differential DNA methylation. At the begin of analysis, sodium bisulfite treated sequence data are aligned and DNA methylation rates for each covered cytosine in the reference genome are called. Subsequently, BAT integrates annotation data and performs basic analysis, i. e., methylation rate distribution plots and hierarchical clustering of the samples. In addition, calling of differentially methylated regions is performed and statistics of called regions are automatically created. Finally, DNA methylation and gene expression data integration is covered by the calculation of correlating regions.
Secondly, a novel algorithm, metilene, for the calculation of differentially methylated regions (DMRs) between two groups of samples is introduced. Existing methods are limited in terms of detection sensitivity as well as time and memory consumption. Our approach is based on a circular binary segmentation, using a scoring function to detect sub-regions that show a stronger difference between the mean methylation levels of two groups than the surrounding background. These sub-regions are tested using a two-dimensional Kolmogorov Smirnov test (2D-KS test) [Fasano 1987] for significant differences taking all samples of each group into account. The use of the non-parametric 2D-KS test allows to avoid assumptions about a background distribution. Furthermore, the two dimensions of the problem, i. e., (i) the detection of a region, such that (ii) the methylation rates of the samples in the groups are significantly different, are taken into account in a single test. The algorithm calls DMRs in sufficiently short time on single sample comparisons as well as on about 50 samples per group. Furthermore, it works on whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data and is able so estimate missing data points from the methylation rates of other samples in the group. Benchmarks on simulated and real data sets show that metilene outperforms other existing methods and is especially suitable for noisy datasets often found for example in cancer analysis.
In the framework of this thesis, the previously introduced methods and algorithms are used to analyze a WGBS dataset of two different subtypes of germinal-center derived B-cell lymphomas and healthy controls. In both lymphoma subgroups genome-wide hypomethylation was found, with an exception for a specific type of promoter regions, i. e., poised promoters, that were frequently found to be hypermethylated. Using the previously presented algorithm, DMRs were called between the three entities. A strong enrichment of DMRs immediately downstream of the transcription start site was observed, indicating the regulatory relevance of this regions. The integration of gene expression data of the same samples, revealed that a considerable amount of the DMRs showed significant correlation between gene expression and DNA methylation. Finally, transcription factor binding sites and mutation data were combined with the methylation and expression data analysis. This identified strongly altered signaling pathways and cancer subtype specific genes. Furthermore, the data integration indicates that mutations and DNA methylation changes may act complementary to another.
Finally, findings from the lymphoma study regarding the hypermethylation of poised promoters in cancer were extended to a huge data set comprising a variety of cancers. We could show that the relation of DNA methylation at a small set of frequently poised regions with respect to the background methylation level is sufficient to classify almost all samples based on DNA methylation data from 450k BeadChips into cancer or non-cancer probes. In addition, we found that the increase in methylation co-occurs with upregulated gene expression of several poised promoter regulated genes in almost all fresh cancer samples, implying a de-poising of poised regions. This upregulated gene expression is in contrast to the silencing of those genes in cancer cell lines, indicating that the upregulated gene expression might be a temporary status and possibly contributes to cancerogenesis.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:14725
Date17 May 2016
CreatorsKretzmer, Helene
ContributorsStadler, Peter F., Vingron, Martin, Universität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typedoc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0048 seconds