Global ETD Search

Return to search

A data cleaning and annotation framework for genome-wide studies.

M.S. / Computer Science and Engineering / Genome-wide studies are sensitive to the quality of annotation data included for analyses and they often involve overlaying both computationally derived and experimentally generated data onto a genomic scaffold. A framework for successful integration of data from diverse sources needs to address, at a minimum, the conceptualization of the biological identity in the data sources, the relationship between the sources in terms of the data present, the independence of the sources and, any discrepancies in the data. The outcome of the process should either resolve or incorporate these discrepancies into downstream analyses. In this thesis we identify factors that are important in detecting errors within and between sources and present a generalized framework to detect discrepancies. An implementation of our workflow is used to demonstrate the utility of the approach in the construction of a genome-wide mouse transcription factor binding map and in the classification of Single nucleotide polymorphisms. We also present the impact of these discrepancies on downstream analyses. The framework is extensible and we discuss future directions including summarization of the discrepancies in a biological relevant manner.

Genomics; Bioinformatics

Identifer	oai:union.ndltd.org:OREGON/oai:content.ohsu.edu:etd/263
Date	11 1900
Creators	Ranjani Ramakrishnan
Publisher	Oregon Health & Science University
Source Sets	Oregon Health and Science Univ. Library
Language	English
Detected Language	English
Type	Text
Format	Needs Adobe Acrobat Reader to view., pdf, 537.914 KB
Rights	http://www.ohsu.edu/library/etd_rights.shtml

Page generated in 0.0022 seconds

A data cleaning and annotation framework for genome-wide studies.

Description

Links & Downloads

Tags

Additional Fields