Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2004. / Includes bibliographical references (leaves 27-29). / Evolutionarily conserved genomic regions (ecores) are understudied, and yet comprise a very large percentage of the Human Genome. Highly conserved human-mouse non-coding ecores, for example, are more abundant within the Human Genome than those regions, which are currently estimated to encode for proteins. Subsets of these ecores also exhibit conservation that extends across several species. These genomic regions have managed to survive millions of years of evolution despite the fact that they do not appear to directly encode for proteins. The survival of these regions compels us to investigate their potential function. Development of a computational framework for the classification and clustering of these regions may be the first step in understanding their function. The need for a standardized framework is underscored by the explosive growth in the number of publicly available, fully sequenced genomes, and the diverse set of methodologies used to generate cross-species alignments. This project describes the design and implementation of a system for the identification, classification and cataloguing of ecores across multiple species. A key feature of this system is its ability to quickly incorporate new genomes and assemblies as they become available. Additionally, this system provides investigators with a feature rich user interface, which facilitates the retrieval of ecores based on a wide range of parameters. The system returns a dynamically annotated list of evolutionarily conserved regions, which is used as input to several classification schemes, aimed at identifying families of ecores that share similar features, including depth of evolutionary conservation, position relative to known genes, sequence similarity, / (cont.) and content of transcription factor binding sites. Families of ecores have already been retrieved by the system and clustered using this feature space, and are currently awaiting biological validation. / by Sunil K. Saluja. / S.M.
Identifer | oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/28590 |
Date | January 2004 |
Creators | Saluja, Sunil K. (Sunil Kumar), 1968- |
Contributors | Isaac S. Kohane., Harvard University--MIT Division of Health Sciences and Technology., Harvard University--MIT Division of Health Sciences and Technology. |
Publisher | Massachusetts Institute of Technology |
Source Sets | M.I.T. Theses and Dissertation |
Language | en_US |
Detected Language | English |
Type | Thesis |
Format | 29 leaves, 1741161 bytes, 1741965 bytes, application/pdf, application/pdf, application/pdf |
Rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission., http://dspace.mit.edu/handle/1721.1/7582 |
Page generated in 0.0017 seconds