Global ETD Search

1	The long and the short of computational ncRNA prediction Rose, Dominic 12 November 2010 (has links) (PDF) Non-coding RNAs (ncRNAs) are transcripts that function directly as RNA molecule without ever being translated to protein. The transcriptional output of eukaryotic cells is diverse, pervasive, and multi-layered. It consists of spliced as well as unspliced transcripts of both protein-coding messenger RNAs and functional ncRNAs. However, it also contains degradable non-functional by-products and artefacts - certainly a reason why ncRNAs have long been wrongly disposed as transcriptional noise. Today, RNA-controlled regulatory processes are broadly recognized for a variety of ncRNA classes. The thermoresponsive ROSE ncRNA (repression of heat shock gene expression) is only one example of a regulatory ncRNA acting at the post-transcriptional level via conformational changes of its secondary structure. Bioinformatics helps to identify novel ncRNAs in the bulk of genomic and transcriptomic sequence data which are produced at ever increasing rates. However, ncRNA annotation is unfortunately not part of generic genome annotation pipelines. Dedicated computational searches for particular ncRNAs are veritable research projects in their own right. Despite best efforts, ncRNAs across the animal phylogeny remain to a large extent uncharted territory. This thesis describes a comprehensive collection of exploratory bioinformatic field studies designed to de novo predict ncRNA genes in a series of computational screens and in a multitude of newly sequenced genomes. Non-coding RNAs can be divided into subclasses (families) according to peculiar functional, structural, or compositional similarities. A simple but eligible and frequently applied criterion to classify RNA species is length. In line, the thesis is structured into two parts: We present a series of pilot-studies investigating (1) the short and (2) the long ncRNA repertoire of several model species by means of state-of-the-art bioinformatic techniques. In the first part of the thesis, we focus on the detection of short ncRNAs exhibiting thermodynamically stable and evolutionary conserved secondary structures. We provide evidence for the presence of short structured ncRNAs in a variety of different species, ranging from bacteria to insects and higher eukaryotes. In particular, we highlight drawbacks and opportunities of RNAz-based ncRNA prediction at several hitherto scarcely investigated scenarios, as for example ncRNA prediction in the light of whole genome duplications. A recent microarray study provides experimental evidence for our approach. Differential expression of at least one-sixth of our drosophilid RNAz predictions has been reported. Beyond the means of RNAz, we moreover manually compile sophisticated annotation of short ncRNAs in schistosomes. Obviously, accumulating knowledge about the genetic material of malaria causing parasites which infect millions of humans world-wide is of utmost scientific interest. Since the performance of any comparative genomics approach is limited by the quality of its input alignments, we introduce a novel light-weight and performant genome-wide alignment approach: NcDNAlign. Although the tool is optimized for speed rather than sensitivity and requires only a minor fraction of CPU time compared to existing programs, we demonstrate that it is basically as sensitive and specific as competing approaches when applied to genome-wide ncRNA gene finding and analysis of ultra-conserved regions. By design, however, prediction approaches that search for regions with an excess of mutations that maintain secondary structure motifs will miss ncRNAs that are unstructured or whose structure is not well conserved in evolution. In the second part of the thesis, we therefore overcome secondary structure prediction and, based on splice site detection, develop novel strategies specifically designed to identify long ncRNAs in genomic sequences - probably the open problem in current RNA research. We perform splice site anchored gene-finding in drosophilids, nematodes, and vertebrate genomes and, at least for a subset of obtained candidate genes, provide experimental evidence for expression and the existence of novel spliced transcripts undoubtedly confirming our approach. In summary, we found evidence for a large number of previously undescribed RNAs which consolidates the idea of non-coding RNAs as an abundant class of regulatory active transcripts. Certainly, ncRNA prediction is a complex task. This thesis, however, rationally advises how to unveil the RNA complement of newly sequenced genomes. Since our results have already established both subsequent computational as well as experimental studies, we believe to have enduringly stimulated the field of RNA research and to have contributed to an enriched view on the subject. ncRNA Vorhersage komparative Genomik Vorhersage von Spleiss-Stellen ncRNA prediction comparative genomics splice site prediction ddc:000
2	Bioinformatics approaches to analysing RNA mediated regulation of gene expression Childs, Liam January 2010 (has links) The genome can be considered the blueprint for an organism. Composed of DNA, it harbours all organism-specific instructions for the synthesis of all structural components and their associated functions. The role of carriers of actual molecular structure and functions was believed to be exclusively assumed by proteins encoded in particular segments of the genome, the genes. In the process of converting the information stored genes into functional proteins, RNA – a third major molecule class – was discovered early on to act a messenger by copying the genomic information and relaying it to the protein-synthesizing machinery. Furthermore, RNA molecules were identified to assist in the assembly of amino acids into native proteins. For a long time, these - rather passive - roles were thought to be the sole purpose of RNA. However, in recent years, new discoveries have led to a radical revision of this view. First, RNA molecules with catalytic functions - thought to be the exclusive domain of proteins - were discovered. Then, scientists realized that much more of the genomic sequence is transcribed into RNA molecules than there are proteins in cells begging the question what the function of all these molecules are. Furthermore, very short and altogether new types of RNA molecules seemingly playing a critical role in orchestrating cellular processes were discovered. Thus, RNA has become a central research topic in molecular biology, even to the extent that some researcher dub cells as “RNA machines”. This thesis aims to contribute towards our understanding of RNA-related phenomena by applying Bioinformatics means. First, we performed a genome-wide screen to identify sites at which the chemical composition of DNA (the genotype) critically influences phenotypic traits (the phenotype) of the model plant Arabidopsis thaliana. Whole genome hybridisation arrays were used and an informatics strategy developed, to identify polymorphic sites from hybridisation to genomic DNA. Following this approach, not only were genotype-phenotype associations discovered across the entire Arabidopsis genome, but also regions not currently known to encode proteins, thus representing candidate sites for novel RNA functional molecules. By statistically associating them with phenotypic traits, clues as to their particular functions were obtained. Furthermore, these candidate regions were subjected to a novel RNA-function classification prediction method developed as part of this thesis. While determining the chemical structure (the sequence) of candidate RNA molecules is relatively straightforward, the elucidation of its structure-function relationship is much more challenging. Towards this end, we devised and implemented a novel algorithmic approach to predict the structural and, thereby, functional class of RNA molecules. In this algorithm, the concept of treating RNA molecule structures as graphs was introduced. We demonstrate that this abstraction of the actual structure leads to meaningful results that may greatly assist in the characterization of novel RNA molecules. Furthermore, by using graph-theoretic properties as descriptors of structure, we indentified particular structural features of RNA molecules that may determine their function, thus providing new insights into the structure-function relationships of RNA. The method (termed Grapple) has been made available to the scientific community as a web-based service. RNA has taken centre stage in molecular biology research and novel discoveries can be expected to further solidify the central role of RNA in the origin and support of life on earth. As illustrated by this thesis, Bioinformatics methods will continue to play an essential role in these discoveries. / Das Genom eines Organismus enthält alle Informationen für die Synthese aller strukturellen Komponenten und deren jeweiligen Funktionen. Lange Zeit wurde angenommen, dass Proteine, die auf definierten Abschnitten auf dem Genom – den Genen – kodiert werden, die alleinigen Träger der molekularen - und vor allem katalytischen - Funktionen sind. Im Prozess der Umsetzung der genetischen Information von Genen in die Funktion von Proteinen wurden RNA Moleküle als weitere zentrale Molekülklasse identifiziert. Sie fungieren dabei als Botenmoleküle (mRNA) und unterstützen als Trägermoleküle (in Form von tRNA) die Zusammenfügung der einzelnen Aminosäurebausteine zu nativen Proteine. Diese eher passiven Funktionen wurden lange als die einzigen Funktionen von RNA Molekülen angenommen. Jedoch führten neue Entdeckungen zu einer radikalen Neubewertung der Rolle von RNA. So wurden RNA-Moleküle mit katalytischen Eigenschaften entdeckt, sogenannte Ribozyme. Weiterhin wurde festgestellt, dass über proteinkodierende Abschnitte hinaus, weit mehr genomische Sequenzbereiche abgelesen und in RNA Moleküle transkribiert werden als angenommen. Darüber hinaus wurden sehr kleine und neuartige RNA Moleküle identifiziert, die entscheidend bei der Koordinierung der Genexpression beteiligt sind. Diese Entdeckungen rückten RNA als Molekülklasse in den Mittelpunkt moderner molekularbiologischen Forschung und führten zu einer Neubewertung ihrer funktionellen Rolle. Die vorliegende Promotionsarbeit versucht mit Hilfe bioinformatorischer Methoden einen Beitrag zum Verständnis RNA-bezogener Phänomene zu leisten. Zunächst wurde eine genomweite Suche nach Abschnitten im Genom der Modellpflanze Arabidopsis thaliana vorgenommen, deren veränderte chemische Struktur (dem Genotyp) die Ausprägung ausgewählter Merkmale (dem Phänotyp) entscheidend beeinflusst. Dabei wurden sogenannte Ganz-Genom Hybridisierungschips eingesetzt und eine bioinformatische Strategie entwickelt, Veränderungen der chemischen Struktur (Polymorphismen) anhand der veränderten Bindung von genomischer DNA aus verschiedenen Arabidopsis Kultivaren an definierte Proben auf dem Chip zu detektieren. In dieser Suche wurden nicht nur systematisch Genotyp-Phänotyp Assoziationen entdeckt, sondern dabei auch Bereiche identifiziert, die bisher nicht als proteinkodierende Abschnitte annotiert sind, aber dennoch die Ausprägung eines konkreten Merkmals zu bestimmen scheinen. Diese Bereiche wurden desweiteren auf mögliche neue RNA Moleküle untersucht, die in diesen Abschnitten kodiert sein könnten. Hierbei wurde ein neuer Algorithmus eingesetzt, der ebenfalls als Teil der vorliegenden Arbeit entwickelt wurde. Während es zum Standardrepertoire der Molekularbiologen gehört, die chemische Struktur (die Sequenz) eines RNA Moleküls zu bestimmen, ist die Aufklärung sowohl der Struktur als auch der konkreten Funktion des Moleküls weitaus schwieriger. Zu diesem Zweck wurde in dieser Arbeit ein neuer algorithmischer Ansatz entwickelt, der mittels Computermethoden eine Zuordnung von RNA Molekülen zu bestimmten Funktionsklassen gestattet. Hierbei wurde das Konzept der Beschreibung von RNA-Sekundärstrukturen als Graphen genutzt. Es konnte gezeigt werden, dass diese Abstraktion von der konkreten Struktur zu nützlichen Aussagen zur Funktion führt. Des weiteren konnte demonstriert werden, dass graphen-theoretisch abgeleitete Merkmale von RNA-Molekülen einen neuen Zugang zum Verständnis der Struktur-Funktionsbeziehungen ermöglichen. Die entwickelte Methode (Grapple) wurde als web-basierte Anwendung der wissenschaftlichen Welt zur Verfügung gestellt. RNA hat sich als ein zentraler Forschungsgegenstand der Molekularbiologie etabliert und neue Entdeckungen können erwartet werden, die die zentrale Rolle von RNA bei der Entstehung und Aufrechterhaltung des Lebens auf der Erde weiter untermauern. Bioinformatische Methoden werden dabei weiterhin eine essentielle Rolle spielen. de novo ncRNA Vorhersage Vereinigungs-Mapping Support-Vektor-Maschine RNA Genotypisierung de novo ncRNA prediction association mapping support vector machine RNA genotyping Life sciences
3	The long and the short of computational ncRNA prediction Rose, Dominic 11 March 2010 (has links) Non-coding RNAs (ncRNAs) are transcripts that function directly as RNA molecule without ever being translated to protein. The transcriptional output of eukaryotic cells is diverse, pervasive, and multi-layered. It consists of spliced as well as unspliced transcripts of both protein-coding messenger RNAs and functional ncRNAs. However, it also contains degradable non-functional by-products and artefacts - certainly a reason why ncRNAs have long been wrongly disposed as transcriptional noise. Today, RNA-controlled regulatory processes are broadly recognized for a variety of ncRNA classes. The thermoresponsive ROSE ncRNA (repression of heat shock gene expression) is only one example of a regulatory ncRNA acting at the post-transcriptional level via conformational changes of its secondary structure. Bioinformatics helps to identify novel ncRNAs in the bulk of genomic and transcriptomic sequence data which are produced at ever increasing rates. However, ncRNA annotation is unfortunately not part of generic genome annotation pipelines. Dedicated computational searches for particular ncRNAs are veritable research projects in their own right. Despite best efforts, ncRNAs across the animal phylogeny remain to a large extent uncharted territory. This thesis describes a comprehensive collection of exploratory bioinformatic field studies designed to de novo predict ncRNA genes in a series of computational screens and in a multitude of newly sequenced genomes. Non-coding RNAs can be divided into subclasses (families) according to peculiar functional, structural, or compositional similarities. A simple but eligible and frequently applied criterion to classify RNA species is length. In line, the thesis is structured into two parts: We present a series of pilot-studies investigating (1) the short and (2) the long ncRNA repertoire of several model species by means of state-of-the-art bioinformatic techniques. In the first part of the thesis, we focus on the detection of short ncRNAs exhibiting thermodynamically stable and evolutionary conserved secondary structures. We provide evidence for the presence of short structured ncRNAs in a variety of different species, ranging from bacteria to insects and higher eukaryotes. In particular, we highlight drawbacks and opportunities of RNAz-based ncRNA prediction at several hitherto scarcely investigated scenarios, as for example ncRNA prediction in the light of whole genome duplications. A recent microarray study provides experimental evidence for our approach. Differential expression of at least one-sixth of our drosophilid RNAz predictions has been reported. Beyond the means of RNAz, we moreover manually compile sophisticated annotation of short ncRNAs in schistosomes. Obviously, accumulating knowledge about the genetic material of malaria causing parasites which infect millions of humans world-wide is of utmost scientific interest. Since the performance of any comparative genomics approach is limited by the quality of its input alignments, we introduce a novel light-weight and performant genome-wide alignment approach: NcDNAlign. Although the tool is optimized for speed rather than sensitivity and requires only a minor fraction of CPU time compared to existing programs, we demonstrate that it is basically as sensitive and specific as competing approaches when applied to genome-wide ncRNA gene finding and analysis of ultra-conserved regions. By design, however, prediction approaches that search for regions with an excess of mutations that maintain secondary structure motifs will miss ncRNAs that are unstructured or whose structure is not well conserved in evolution. In the second part of the thesis, we therefore overcome secondary structure prediction and, based on splice site detection, develop novel strategies specifically designed to identify long ncRNAs in genomic sequences - probably the open problem in current RNA research. We perform splice site anchored gene-finding in drosophilids, nematodes, and vertebrate genomes and, at least for a subset of obtained candidate genes, provide experimental evidence for expression and the existence of novel spliced transcripts undoubtedly confirming our approach. In summary, we found evidence for a large number of previously undescribed RNAs which consolidates the idea of non-coding RNAs as an abundant class of regulatory active transcripts. Certainly, ncRNA prediction is a complex task. This thesis, however, rationally advises how to unveil the RNA complement of newly sequenced genomes. Since our results have already established both subsequent computational as well as experimental studies, we believe to have enduringly stimulated the field of RNA research and to have contributed to an enriched view on the subject. info:eu-repo/classification/ddc/000 ddc:000

1

Page generated in 0.0611 seconds