DNA methylation plays a key role in disease analysis, especially for studies that compare
known large scale differences in CpG sites, such as cancer/normal studies or between-tissues
studies. However, before any analysis can be done, data normalization and preprocessing of
methylation data are required. A useful data preprocessing pipeline for large scale comparisons
is Functional Normalization (FunNorm), (Fortin et al., 2014) implemented in the minfi
package in R. In FunNorm, the univariate quantiles of the methylated and unmethylated
signal values in the raw data are used to preprocess the data. However, although FunNorm
has been shown to outperform other preprocessing and data normalization processes for
these types of studies, it does not account for the correlation between the methylated and
unmethylated signals into account; the focus of this paper is to improve upon FunNorm by
taking this correlation into account. The concept of a bivariate quantile is used in this study
as an attempt to take the correlation between the methylated and unmethylated signals
into consideration. From the bivariate quantiles found, the partial least squares method is
then used on these quantiles in this preprocessing. The raw datasets used for this research
were collected from the European Molecular Biology Laboratory - European Bioinformatics
Institute (EMBL-EBI) website. The results from this preprocessing algorithm were then
compared and contrasted to the results from FunNorm. Drawbacks, limitations and future
research are then discussed. / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/26203 |
Date | January 2021 |
Creators | Yacas, Clifford |
Contributors | Canty, Angelo, Mathematics and Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.002 seconds