• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • Tagged with
  • 6
  • 6
  • 5
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Arthropod 7SK RNA

Gruber, Andreas R., Kilgus, Carsten, Mosig, Axel, Hofacker, Ivo L., Hennig, Wolfgang, Stadler, Peter F. 25 January 2019 (has links)
The 7SK small nuclear RNA (snRNA) is a key player in the regulation of polymerase (pol) II transcription. The 7SK RNA was long believed to be specific to vertebrates where it is highly conserved. Homologs in basal deuterostomes and a few lophotrochozoan species were only recently reported. On longer timescales, 7SK evolves rapidly with only few conserved sequence and structure motifs. Previous attempts to identify the Drosophila homolog thus have remained unsuccessful despite considerable efforts. Here we report on the discovery of arthropod 7SK RNAs using a novel search strategy based on pol III promoters, as well as the subsequent verification of its expression. Our results demonstrate that a 7SK snRNA featuring 2 highly structured conserved domains was present already in the bilaterian ancestor.
2

Efficient Homology Search for Genomic Sequence Databases

Cameron, Michael, mcam@mc-mc.net January 2006 (has links)
Genomic search tools can provide valuable insights into the chemical structure, evolutionary origin and biochemical function of genetic material. A homology search algorithm compares a protein or nucleotide query sequence to each entry in a large sequence database and reports alignments with highly similar sequences. The exponential growth of public data banks such as GenBank has necessitated the development of fast, heuristic approaches to homology search. The versatile and popular blast algorithm, developed by researchers at the US National Center for Biotechnology Information (NCBI), uses a four-stage heuristic approach to efficiently search large collections for analogous sequences while retaining a high degree of accuracy. Despite an abundance of alternative approaches to homology search, blast remains the only method to offer fast, sensitive search of large genomic collections on modern desktop hardware. As a result, the tool has found widespread use with millions of queries posed each day. A significant investment of computing resources is required to process this large volume of genomic searches and a cluster of over 200 workstations is employed by the NCBI to handle queries posed through the organisation's website. As the growth of sequence databases continues to outpace improvements in modern hardware, blast searches are becoming slower each year and novel, faster methods for sequence comparison are required. In this thesis we propose new techniques for fast yet accurate homology search that result in significantly faster blast searches. First, we describe improvements to the final, gapped alignment stages where the query and sequences from the collection are aligned to provide a fine-grain measure of similarity. We describe three new methods for aligning sequences that roughly halve the time required to perform this computationally expensive stage. Next, we investigate improvements to the first stage of search, where short regions of similarity between a pair of sequences are identified. We propose a novel deterministic finite automaton data structure that is significantly smaller than the codeword lookup table employed by ncbi-blast, resulting in improved cache performance and faster search times. We also discuss fast methods for nucleotide sequence comparison. We describe novel approaches for processing sequences that are compressed using the byte packed format already utilised by blast, where four nucleotide bases from a strand of DNA are stored in a single byte. Rather than decompress sequences to perform pairwise comparisons, our innovations permit sequences to be processed in their compressed form, four bases at a time. Our techniques roughly halve average query evaluation times for nucleotide searches with no effect on the sensitivity of blast. Finally, we present a new scheme for managing the high degree of redundancy that is prevalent in genomic collections. Near-duplicate entries in sequence data banks are highly detrimental to retrieval performance, however existing methods for managing redundancy are both slow, requiring almost ten hours to process the GenBank database, and crude, because they simply purge highly-similar sequences to reduce the level of internal redundancy. We describe a new approach for identifying near-duplicate entries that is roughly six times faster than the most successful existing approaches, and a novel approach to managing redundancy that reduces collection size and search times but still provides accurate and comprehensive search results. Our improvements to blast have been integrated into our own version of the tool. We find that our innovations more than halve average search times for nucleotide and protein searches, and have no signifcant effect on search accuracy. Given the enormous popularity of blast, this represents a very significant advance in computational methods to aid life science research.
3

A Study in RNA Bioinformatics : Identification, Prediction and Analysis

Freyhult, Eva January 2007 (has links)
<p>Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels.</p><p>This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences.</p><p>We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods.</p><p>We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server.</p><p>Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites.</p><p>Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos.</p>
4

A Study in RNA Bioinformatics : Identification, Prediction and Analysis

Freyhult, Eva January 2007 (has links)
Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels. This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences. We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods. We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server. Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites. Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos.
5

Improved Workflows for RNA Homology Search

Yazbeck, Ali 24 July 2019 (has links)
Non-coding RNAs are the most abundant class of RNAs found throughout genomes. These RNAs are key players of gene regulation and thus, the func- tion of whole organisms. Numerous methods have been developed so far for detecting novel classes of ncRNAs or finding homologs to the known ones. Because of their abundance, the sequence availability of these RNAs is rapidly increasing, as is the case for example for microRNAs. However, for classes of them, still only incomplete information is available, invertebrates 7SK snRNA for instance. Consequently, a lot of false positive outputs are produced in the former case, and more accurate annotation methods are needed for the latter cases to improve derivable knowledge. This makes the accuracy of gathering correct homologs a challenging task and it leads directly to a not less important problem, the curation of these data. Finding solutions for the aforementioned problems is more complex than one would expect as these RNAs are characterized not only by sequences informa- tion but also structure information, in addition to distinct biological features. In this work, data curation methods and sensitive homology search are shown as complementary methods to solve these problems. A careful curation and annotation method revealed new structural information in the invertebrates 7SK snRNA, which pushes the investigation in the area forward. This has been reflected by detecting new high potential 7SK RNA genes in different invertebrates groups. Moreover, the gaps between homology search and well- curated data on the one side, and between experimental and computational outputs on the other side, are closed. These gaps were bridged by a curation method applied to the microRNA data, which was then turned into a com- prehensive workflow implemented into an automated pipeline. MIRfix is a microRNA curation pipeline considering the detailed sequence and structure information of the metazoan microRNAs, together with biological features related to the microRNA biogenesis. Moreover, this pipeline can be integrated into existing methods and tools related to microRNA homology search and data curation. The application of this pipeline on the biggest open source microRNA database revealed its high capacity in detecting wrong annotated pre-miRNA, eventually improving alignment quality of the majority of the available data. Additionally, it was tested with artificial datasets highlighting the high accuracy in predicting the pre-miRNA components, miRNA and miRNA*.:Chapter 1: Introduction Chapter 2: Biological and Computational background 2.1 Biology 2.1.1 Non-coding RNAs 2.1.2 RNA secondary structure 2.1.3 Homology versus similarity 2.1.4 Evolution 2.2 The role of computational biology 2.2.1 Alignment 2.2.1.1 Pairwise alignment 2.2.1.2 Multiple sequence alignment (MSA) 2.2.2 Homology search 2.2.2.1 Sequence-based 2.2.2.2 Structure-based 2.2.3 RNA secondary structure prediction Chapter 3: Careful curation for snRNA 3.1 Biological background 3.2 Introduction to the problem 3.3 Methods 3.3.1 Initial seeds and models construction 3.3.2 Models anatomy then merging 3.4 Results 3.4.1 Refined model of arthropod 7SK RNA 3.4.1.1 5’ Stem 3.4.1.2 Extension of Stem A 3.4.1.3 Novel stem B in invertebrates 3.4.1.4 3’ Stem 3.4.2 Invertebrates model conserves the HEXIM1 binding site 3.4.3 Computationally high potential 7SK RNA candidate . 3.4.4 Sensitivity of the final proposed model 3.5 Conclusion Chapter 4: Behind the scenes of microRNA driven regulation 4.1 Biological background 4.2 Databases and problems 4.3 MicroRNA detection and curation approaches Chapter 5: Initial microRNA curation 5.1 Introduction 5.2 Methods 5.2.1 Data pre-processing 5.2.2 Initial seeds creation 5.2.3 Main course 5.3 Results and discussion 5.4 Conclusion Chapter 6: MIRfix pipeline 6.1 Introduction 6.2 Methods 6.2.1 Inputs and Outputs 6.2.2 Prediction of the mature sequences 6.2.3 The original precursor and its alternative 6.2.4 The validation of the precursor 6.2.5 Alignment processing 6.3 Results and statistics 6.4 Applications 6.4.1 Real life examples and artificial data tests 6.4.2 miRNA and miRNA* prediction 6.4.3 Covariance models 6.5 Conclusion Chapter 7: Discussion
6

Non-coding RNA annotation of the genome of Trichoplax adhaerens

Hertel, Jana, de Jong, Danielle, Marz, Manja, Rose, Dominic, Tafer, Hakim, Tanzer, Andrea, Schierwater, Bernd, Stadler, Peter F. 04 February 2019 (has links)
A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome.

Page generated in 0.0364 seconds