Non-coding RNAs are the most abundant class of RNAs found throughout
genomes. These RNAs are key players of gene regulation and thus, the func-
tion of whole organisms. Numerous methods have been developed so far for
detecting novel classes of ncRNAs or finding homologs to the known ones.
Because of their abundance, the sequence availability of these RNAs is rapidly
increasing, as is the case for example for microRNAs. However, for classes of
them, still only incomplete information is available, invertebrates 7SK snRNA
for instance. Consequently, a lot of false positive outputs are produced in
the former case, and more accurate annotation methods are needed for the
latter cases to improve derivable knowledge. This makes the accuracy of
gathering correct homologs a challenging task and it leads directly to a not
less important problem, the curation of these data.
Finding solutions for the aforementioned problems is more complex than one
would expect as these RNAs are characterized not only by sequences informa-
tion but also structure information, in addition to distinct biological features.
In this work, data curation methods and sensitive homology search are shown
as complementary methods to solve these problems. A careful curation and
annotation method revealed new structural information in the invertebrates
7SK snRNA, which pushes the investigation in the area forward. This has
been reflected by detecting new high potential 7SK RNA genes in different
invertebrates groups. Moreover, the gaps between homology search and well-
curated data on the one side, and between experimental and computational
outputs on the other side, are closed. These gaps were bridged by a curation
method applied to the microRNA data, which was then turned into a com-
prehensive workflow implemented into an automated pipeline. MIRfix is a
microRNA curation pipeline considering the detailed sequence and structure
information of the metazoan microRNAs, together with biological features
related to the microRNA biogenesis. Moreover, this pipeline can be integrated
into existing methods and tools related to microRNA homology search and
data curation. The application of this pipeline on the biggest open source
microRNA database revealed its high capacity in detecting wrong annotated
pre-miRNA, eventually improving alignment quality of the majority of the
available data. Additionally, it was tested with artificial datasets highlighting
the high accuracy in predicting the pre-miRNA components, miRNA and
miRNA*.:Chapter 1: Introduction
Chapter 2: Biological and Computational background
2.1 Biology
2.1.1 Non-coding RNAs
2.1.2 RNA secondary structure
2.1.3 Homology versus similarity
2.1.4 Evolution
2.2 The role of computational biology
2.2.1 Alignment
2.2.1.1 Pairwise alignment
2.2.1.2 Multiple sequence alignment (MSA)
2.2.2 Homology search
2.2.2.1 Sequence-based
2.2.2.2 Structure-based
2.2.3 RNA secondary structure prediction
Chapter 3: Careful curation for snRNA
3.1 Biological background
3.2 Introduction to the problem
3.3 Methods
3.3.1 Initial seeds and models construction
3.3.2 Models anatomy then merging
3.4 Results
3.4.1 Refined model of arthropod 7SK RNA
3.4.1.1 5’ Stem
3.4.1.2 Extension of Stem A
3.4.1.3 Novel stem B in invertebrates
3.4.1.4 3’ Stem
3.4.2 Invertebrates model conserves the HEXIM1 binding site
3.4.3 Computationally high potential 7SK RNA candidate .
3.4.4 Sensitivity of the final proposed model
3.5 Conclusion
Chapter 4: Behind the scenes of microRNA driven regulation
4.1 Biological background
4.2 Databases and problems
4.3 MicroRNA detection and curation approaches
Chapter 5: Initial microRNA curation
5.1 Introduction
5.2 Methods
5.2.1 Data pre-processing
5.2.2 Initial seeds creation
5.2.3 Main course
5.3 Results and discussion
5.4 Conclusion
Chapter 6: MIRfix pipeline
6.1 Introduction
6.2 Methods
6.2.1 Inputs and Outputs
6.2.2 Prediction of the mature sequences
6.2.3 The original precursor and its alternative
6.2.4 The validation of the precursor
6.2.5 Alignment processing
6.3 Results and statistics
6.4 Applications
6.4.1 Real life examples and artificial data tests
6.4.2 miRNA and miRNA* prediction
6.4.3 Covariance models
6.5 Conclusion
Chapter 7: Discussion
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:34681 |
Date | 24 July 2019 |
Creators | Yazbeck, Ali |
Contributors | Universität Leipzig |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0023 seconds