1 |
Computational approaches for RNA energy parameter estimationAndronescu, Mirela Stefania 05 1900 (has links)
RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature.
The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model.
We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement.
We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models.
|
2 |
Computational approaches for RNA energy parameter estimationAndronescu, Mirela Stefania 05 1900 (has links)
RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature.
The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model.
We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement.
We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models.
|
3 |
Computational approaches for RNA energy parameter estimationAndronescu, Mirela Stefania 05 1900 (has links)
RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature.
The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model.
We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement.
We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models. / Science, Faculty of / Computer Science, Department of / Graduate
|
4 |
Evolving Towards the Hypercycle: A Spatial Model of Molecular EvolutionAttolini, Camille Stephan-Otto, Stadler, Peter F. 04 October 2018 (has links)
We extend earlier cellular automata models of spatially extended hypercycles by including an explicit genetic component into the model. This allows us to study the sequence evolution of hypercyclically coupled molecular replicators in addition to considering their population dynamics and spatial organization. In line with previous models, that considered either spatial organization or sequence evolution alone, we find both temporal oscillations of the relative concentration of the species forming the hypercycles as well as the formation of spatial organisations including spiral waves. We also confirm the greatly increased robustness of the spatially extended hypercycle against various classes of parasites. We find the sequence evolution of each of the hypercyclically coupled populations proceeds (after an inital selection-dominated phase) in a drift-like manner that can be described by a diffusion process in sequence space. Kimura's theory of neutral evolution is therefore applicable on long time-scales despite the fact that the hypercycle exhibits extreme periodic changes in population sizes and that are governed solely by frequency-dependent selection.
|
5 |
Probabilistic models of RNA secondary structureAnderson, James William Justin January 2013 (has links)
This thesis develops probabilistic models of RNA secondary structure. The first chapter introduces RNA secondary structure prediction, in particular stochastic context-free grammars (SCFGs), and considers a novel method for automated design of SCFGs. Many SCFGs are found with a similar predictive quality as those commonly used for RNA secondary structure prediction. The second chapter discusses the effect alignment quality, evolutionary distance between sequences, and number of sequences in an alignment have on RNA secondary structure prediction. By combining statistical alignment and SCFG models we can, in a statistically sound setting, average structure predictions over the space of alignments to decrease loss created by poor alignments. The third chapter incorporates additional biological information about RNA secondary structure formation into the decoding of the SCFG posterior distribution. Combining iterative helix formation, phylogenetic modelling, and a distance function between alignment columns leads to the an improvement in the accuracy of comparative RNA secondary structure prediction. Finally, appendices briefly discuss further work concerning probabilistic models of RNA secondary structure which may be of interest to the reader.
|
6 |
RNA secondary sturcture prediction using a combined method of thermodynamics and kineticsPan, Minmin 07 July 2011 (has links)
Nowadays, RNA is extensively acknowledged an important role in the functions of information transfer, structural components, gene regulation and etc. The secondary structure of RNA becomes a key to understand structure-function relationship. Computational prediction of RNA secondary structure does not only provide possible structures, but also elucidates the mechanism of RNA folding. Conventional prediction programs are either derived from evolutionary perspective, or aimed to achieve minimum free energy. In vivo, RNA folds during transcription, which indicates that native RNA structure is a result from both thermodynamics and kinetics.
In this thesis, I first reviewed the current leading kinetic folding programs and demonstrate that these programs are not able to predict secondary structure accurately. Upon that, I proposed a new sequential folding program called GTkinetics. Given an RNA sequence, GTkinetics predicts a secondary structure and a series of RNA folding trajectories. It treats the RNA as a growing chain, and adds stable local structures sequentially. It is featured with a Z-score to evaluate stability of local structures, which is able to locate native local structures with high confidence. Since all stable local structures are captured in GTkinetics, it results in some false positives, which prevents the native structure to form as the chain grows. This suggests a refolding model to melt the false positive hairpins, probable intermediate structures, and to fold the RNA into a new structure with reliable long-range helices. By analyzing suboptimal ensemble along the folding pathway, I suggested a refolding mechanism, with which refolding can be evaluated whether or not to take place.
Another way to favor local structures over long-distance structures, we introduced a distance penalty function into the free energy calculation. I used a sigmoidal function to compute the energy penalty according to the distance in the primary sequence between two nucleotides of a base pair. For both the training dataset and the test dataset, the distance function improves the prediction to some extent.
In order to characterize the differences between local and long-range helices, I carried out analysis of standardized local nucleotide composition and base pair composition according to the two groups. The results show that adenine accumulates on the 5' side of local structure, but not on that of long-range helices. GU base pairs occur significantly more frequent in the local helices than that in the long-range helices. These indicate that the mechanisms to form local and long range helices are different, which is encoded in the sequence itself.
Based on all the results, I will draw conclusions and suggest future directions to enhance the current sequential folding program.
|
7 |
Identifying RNA secondary structures in the SARS-CoV-2 viral genomeZiesel, Alison 21 April 2022 (has links)
Motivation: SARS-CoV-2 is the virus responsible for the COVID-19 pandemic that currently impacts our world. SARS-CoV-2 is an enveloped, positive sense single stranded RNA virus and like other RNA viruses is known to form RNA secondary structure in its genome. In related viruses the secondary structures are responsible for fulfilling roles including proper expression of viral gene products and possibly regulation of viral genome replication. I hypothesize that SARS-CoV-2 may be capable of forming additional secondary structures beyond what is already known and that those secondary structures are identifiable on the basis of sequence conservation with related RNA viruses.
Results: By repurposing and expanding an existing computational pipeline de- signed for the detection of structural RNAs in vertebrates, I identified 40 regions of the SARS-CoV-2 genome highly likely to form secondary structure. Partial re- identification of known secondary structures in the SARS-CoV-2 genome was achieved. To further explore the role these structures may fill, the 9 most conservatively pre- dicted structures were analyzed in wild viral samples collected from three Canadian provinces, and distinct patterns of mutation were observed. The 40 regions identi- fied by my modified pipeline were compared against three contemporary works and the differences between findings were quantified. Lastly, Variants of Concern for SARS-CoV-2 were analyzed for prevalent but poorly reported mutations that may influence RNA secondary structure. Code developed for this work is available at https://github.com/aziesel/MSc. / Graduate / 2023-04-06
|
8 |
Vliv sekvencí intronů na efektivitu sestřihu v Saccharomyces cerevisiae. / The influence of intron sequences on splicing effectivity in Saccharomyces cerevisiaeOplová, Michaela January 2015 (has links)
Pre-mRNA splicing is a highly regulated cellular process. The tight cooperation of spliceosome and other splicing factors that enable pre-mRNA cis-elements interpretation results in precise pre-mRNA splicing regulation. Short conserved splicing sequences within introns represent an elementary and indispensable element for intron removal from primary transcript, yet they are not sufficient signals for efficient splicing events. Additional pre-mRNA features affect complex splicing regulation. We took advantage of strains with slightly disrupted spliceosome (prp45(1-169)) to study the effect of ACT1 and MAF1 intronic sequences on splicing efficiency. Here we show, that ACT1 intron region between branch point (BP) and 3' splice site (3'ss) maintains splicing efficiency in mutant cells. However, the specific element within this region was not determined. In addition, results implicate an alternative BP in splicing efficiency modulation in yeast Saccharomyces cerevisiae. Interestingly, this alternative BP is localized in ACT1 intron outside of the BP-3'ss region. Furthermore, splicing factors with potential influence on 3'ss selection were studied. Heterodimer composed of Slu7p and Prp18p participates in 3'ss positioning to the active site of the spliceosome. Splicing analysis of substrates with two...
|
9 |
Identification and classification of ncRNA molecules using graph propertiesChilds, Liam, Nikoloski, Zoran, May, Patrick, Walther, Dirk January 2009 (has links)
The study of non-coding RNA genes has received increased attention in recent years fuelled by accumulating evidence that larger portions of genomes than previously acknowledged are transcribed into RNA molecules of mostly unknown function, as well as the discovery of novel non-coding RNA types and functional RNA elements. Here, we demonstrate that specific properties of graphs that represent the predicted RNA secondary structure reflect functional information. We introduce a computational algorithm and an associated web-based tool (GraPPLE) for classifying non-coding RNA molecules as functional and, furthermore, into Rfam families based on their graph properties. Unlike sequence-similarity-based methods and covariance models, GraPPLE is demonstrated to be more robust with regard to increasing sequence divergence, and when combined with existing methods, leads to a significant improvement of prediction accuracy. Furthermore, graph properties identified as most informative are shown to provide an understanding as to what particular structural features render RNA molecules functional. Thus, GraPPLE may offer a valuable computational filtering tool to identify potentially interesting RNA molecules among large candidate datasets.
|
10 |
Klasifikace bakterií do taxonomických kategorií na základě vlastností 16s rRNA / Bacteria Classification into Taxonomic Categories Based on Properties of 16s rRNAGrešová, Katarína January 2020 (has links)
The main goal of this thesis was to design and implement a tool that would be able to classify the sequences of the 16S rRNA gene into taxonomic categories using the properties of the 16S rRNA gene. The created tool analyzes all input sequences simultaneously, which differs from common classification approaches, which classify input sequences individually. This tool relies on the fact that bacteria contain several copies of the 16S rRNA gene, which may differ in sequence. The main contribution of this work is design, implementation and evaluation of the capabilities of this tool. Experiments have shown that the proposed tool is able to identify the corresponding bacteria for smaller datasets and determine the correct ratios of their abundances. However, with larger datasets, the state space becomes very large and fragmented, which requires further improvements in order for it to search the state space in an efficient way.
|
Page generated in 0.0767 seconds