Gene regulation lies at the heart of most biological processes and transcription factors are the key molecules that control tissues specific gene expression. In higher eukaryotes transcription factors control gene expression by binding regulatory DNA segments called cis-regulatory modules (CRMs). The increasing number of sequenced genomes of multicellular eukaryotes along with high-throughput methods such as whole genome microarray expression data allows for systematic characterization of the CRMs that control gene expression. A first step towards understanding gene regulation is the identification of the regulatory elements present in the genome. We take advantage of the large database of spatio-temporal patterns of gene expression in D. melanogaster embryogenesis to identify sets of developmentally co-expressed genes. We developed a computational method that identifies DNA binding sites for transcription factors from families of co-regulated genes that are expressed during Drosophila embryo development. This method discovers over-represented motifs in a set of co-regulated genes using the exhaustive motif enumeration technique. Clustering the predicted motifs identifies the CRMs, which assist in translating a combinatorial code of TF inputs into a specific gene expression output. The predicted CRMs were verified experimentally by searching the whole genome for the predicted CRMs and establishing expression pattern of the genes that are associated with these CRMs. It is well know that the gene expression is substantially controlled through CRMs and those key regulatory sequences are conserved in related species. The conservation of CRMs can be studied by comparing the related genomes and alignment methods are widely used computational tools for comparing the sequences. However, in distantly related species the CRM sequences are simply not align able. To identify the similar CRMs in distantly related species we developed a non-alignment based method for discovering similar CRMs in related species. This method is based on word frequencies where the given sequences are compared using Poisson based metric. When starting with a set of CRMs involved in Drosophila early embryo development, we show here that our non-alignment method successfully detects similar CRMs in distantly related species ( D. ananassae, D. pseudoobscura, D. willisoni, D. mojavensis, D. virilis, D. grimshawi ). This method proved efficient in discriminating the functional CRMs from the non-functional ones.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa.de:bsz:14-qucosa-25057 |
Date | 09 November 2009 |
Creators | Arunachalam, Manonmani |
Contributors | Technische Universität Dresden, Fakultät Mathematik und Naturwissenschaften, Max-Planck-Institut für Molekulare Zellbiologie und Genetik, Developmental Biology, Dr. Pavel Tomancak, Prof. Daniel J Muller, Prof. Michael Schroeder |
Publisher | Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:doctoralThesis |
Format | application/pdf |
Page generated in 0.0019 seconds