Return to search

A data cleaning and annotation framework for genome-wide studies.

M.S. / Computer Science and Engineering / Genome-wide studies are sensitive to the quality of annotation data included for analyses and they often involve overlaying both computationally derived and experimentally generated data onto a genomic scaffold. A framework for successful integration of data from diverse sources needs to address, at a minimum, the conceptualization of the biological identity in the data sources, the relationship between the sources in terms of the data present, the independence of the sources and, any discrepancies in the data. The outcome of the process should either resolve or incorporate these discrepancies into downstream analyses. In this thesis we identify factors that are important in detecting errors within and between sources and present a generalized framework to detect discrepancies. An implementation of our workflow is used to demonstrate the utility of the approach in the construction of a genome-wide mouse transcription factor binding map and in the classification of Single nucleotide polymorphisms. We also present the impact of these discrepancies on downstream analyses. The framework is extensible and we discuss future directions including summarization of the discrepancies in a biological relevant manner.

Identiferoai:union.ndltd.org:OREGON/oai:content.ohsu.edu:etd/263
Date11 1900
CreatorsRanjani Ramakrishnan
PublisherOregon Health & Science University
Source SetsOregon Health and Science Univ. Library
LanguageEnglish
Detected LanguageEnglish
TypeText
FormatNeeds Adobe Acrobat Reader to view., pdf, 537.914 KB
Rightshttp://www.ohsu.edu/library/etd_rights.shtml

Page generated in 0.0021 seconds