Return to search

A computational framework for the identification, cataloging, and classification of evolutionary conserved genomic DNA

Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2004. / Includes bibliographical references (leaves 27-29). / Evolutionarily conserved genomic regions (ecores) are understudied, and yet comprise a very large percentage of the Human Genome. Highly conserved human-mouse non-coding ecores, for example, are more abundant within the Human Genome than those regions, which are currently estimated to encode for proteins. Subsets of these ecores also exhibit conservation that extends across several species. These genomic regions have managed to survive millions of years of evolution despite the fact that they do not appear to directly encode for proteins. The survival of these regions compels us to investigate their potential function. Development of a computational framework for the classification and clustering of these regions may be the first step in understanding their function. The need for a standardized framework is underscored by the explosive growth in the number of publicly available, fully sequenced genomes, and the diverse set of methodologies used to generate cross-species alignments. This project describes the design and implementation of a system for the identification, classification and cataloguing of ecores across multiple species. A key feature of this system is its ability to quickly incorporate new genomes and assemblies as they become available. Additionally, this system provides investigators with a feature rich user interface, which facilitates the retrieval of ecores based on a wide range of parameters. The system returns a dynamically annotated list of evolutionarily conserved regions, which is used as input to several classification schemes, aimed at identifying families of ecores that share similar features, including depth of evolutionary conservation, position relative to known genes, sequence similarity, / (cont.) and content of transcription factor binding sites. Families of ecores have already been retrieved by the system and clustered using this feature space, and are currently awaiting biological validation. / by Sunil K. Saluja. / S.M.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/28590
Date January 2004
CreatorsSaluja, Sunil K. (Sunil Kumar), 1968-
ContributorsIsaac S. Kohane., Harvard University--MIT Division of Health Sciences and Technology., Harvard University--MIT Division of Health Sciences and Technology.
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
Languageen_US
Detected LanguageEnglish
TypeThesis
Format29 leaves, 1741161 bytes, 1741965 bytes, application/pdf, application/pdf, application/pdf
RightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.0017 seconds