Understanding the connection between an organism’s genotype and its phenotype is a key question in evolutionary biology and genetics. It has been shown that many changes of morphological or other complex phenotypic traits result from changes in the expression pattern of key developmental genes rather than from changes in the genes itself. Such altered gene expression arises often from changes in the gene regulatory regions. That usually means the loss of important transcription factor (TF) binding sites within these regulatory regions, because the interaction between TFs and specific sites on the DNA is a key element of gene regulation.
An established approach for the genome-wide mapping of genomic regions to phenotypes is the Forward Genomics framework. This approach compares the genomic sequences of species with and without the phenotype of interest based upon two ideas. First, the initial loss of a phenotype relaxes selection on all phenotypically related genomic regions and, second, this can happen independently in multiple species. Of interest are such regions that diverged specifically in phenotype-loss species. Although this principle is general, the current implementation is only well-suited for the identification of phenotype related gene-coding regions and has a limited applicability on regulatory regions. The reason is its reliance on sequence conservation as divergence measure, which does not accurately measure functional divergence of regulatory elements.
In this thesis, I developed REforge, a novel implementation of the Forward Genomics principle that takes functional information of regulatory elements in the form of known phenotype-related TF into account. The consideration of the flexible organization of TF binding sites within a regulatory region, both in terms of strength and order, allows the abstraction from the region’s sequence level to its functional level. Thus, functional divergence of regulatory regions is directly compared to phenotypical divergence, which tremendously improves performance compared to Forward Genomics, as I demonstrated on synthetic and real data.
Additionally, I developed TFforge which follows the same approach but aims at identifying the TFs relevant for the given phenotype. Given a multi-species alignment with a phenotype annotation and a set of regulatory regions, TFforge systematically searches for TFs whose changes in binding affinity between species fit the phenotype signature. The reported output is a ranking of the TFs according to their level of correspondence. I prove the concept of this approach on both biological data and artificially generated regions. TFforge can be used as a standalone analysis tool and also to generate the input set of TFs for a subsequent REforge analysis. I demonstrate that REforge in combination with TFforge is able to substantially outperform standard Forward Genomics, i.e. even without foreknowledge of relevant TFs.
Overall, the in this thesis introduced methods are examples for the power of computational tools in comparative genomics to catalyze biological insights. I did not only show a detailed description of the methods but also conducted a real data analysis as validation. REforge and TFforge have a wide applicability on endless phenotypes, both on their own in the association of TF and regulatory region to a phenotype. Moreover, particularly their combination constitutes in respect to gene regulatory network analyses a valuable tool set for evo-devo studies.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:31172 |
Date | 18 September 2018 |
Creators | Langer, Björn |
Contributors | Hiller, Michael, Sbalzarini, Ivo, Stadler, Peter, TU Dresden |
Publisher | Center for Systems Biology Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0023 seconds