Global ETD Search

1	Computational approaches for RNA energy parameter estimation Andronescu, Mirela Stefania 05 1900 (has links) RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature. The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model. We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement. We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models. RNA secondary structure prediction RNA energy models
2	Computational approaches for RNA energy parameter estimation Andronescu, Mirela Stefania 05 1900 (has links) RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature. The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model. We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement. We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models. RNA secondary structure prediction RNA energy models
3	Computational approaches for RNA energy parameter estimation Andronescu, Mirela Stefania 05 1900 (has links) RNA molecules play important roles, including catalysis of chemical reactions and control of gene expression, and their functions largely depend on their folded structures. Since determining these structures by biochemical means is expensive, there is increased demand for computational predictions of RNA structures. One computational approach is to find the secondary structure (a set of base pairs) that minimizes a free energy function for a given RNA conformation. The forces driving RNA folding can be approximated by means of a free energy model, which associates a free energy parameter to a distinct considered feature. The main goal of this thesis is to develop state-of-the-art computational approaches that can significantly increase the accuracy (i.e., maximize the number of correctly predicted base pairs) of RNA secondary structure prediction methods, by improving and refining the parameters of the underlying RNA free energy model. We propose two general approaches to estimate RNA free energy parameters. The Constraint Generation (CG) approach is based on iteratively generating constraints that enforce known structures to have energies lower than other structures for the same molecule. The Boltzmann Likelihood (BL) approach infers a set of RNA free energy parameters which maximize the conditional likelihood of a set of known RNA structures. We discuss several variants and extensions of these two approaches, including a linear Gaussian Bayesian network that defines relationships between features. Overall, BL gives slightly better results than CG, but it is over ten times more expensive to run. In addition, CG requires software that is much simpler to implement. We obtain significant improvements in the accuracy of RNA minimum free energy secondary structure prediction with and without pseudoknots (regions of non-nested base pairs), when measured on large sets of RNA molecules with known structures. For the Turner model, which has been the gold-standard model without pseudoknots for more than a decade, the average prediction accuracy of our new parameters increases from 60% to 71%. For two models with pseudoknots, we obtain an increase of 9% and 6%, respectively. To the best of our knowledge, our parameters are currently state-of-the-art for the three considered models. / Science, Faculty of / Computer Science, Department of / Graduate RNA secondary structure prediction RNA energy models
4	Modeling Protein Secondary Structure by Products of Dependent Experts Cumbaa, Christian January 2001 (has links) A phenomenon as complex as protein folding requires a complex model to approximate it. This thesis presents a bottom-up approach for building complex probabilistic models of protein secondary structure by incorporating the multiple information sources which we call experts. Expert opinions are represented by probability distributions over the set of possible structures. Bayesian treatment of a group of experts results in a consensus opinion that combines the experts' probability distributions using the operators of normalized product, quotient and exponentiation. The expression of this consensus opinion simplifiesto a product of the expert opinions with two assumptions: (1) balanced training of experts, i. e. , uniform prior probability over all structures, and (2) conditional independence between expert opinions,given the structure. This research also studies how Markov chains and hidden Markov models may be used to represent expert opinion. Closure properties areproven, and construction algorithms are given for product of hidden Markov models, and product, quotient and exponentiation of Markovchains. Algorithms for extracting single-structure predictions from these models are also given. Current product-of-experts approaches in machine learning are top-down modeling strategies that assume expert independence, and require simultaneous training of all experts. This research describes a bottom-up modeling strategy that can incorporate conditionally dependent experts, and assumes separately trained experts. Computer Science probabilistic modeling protein secondary structure prediction expert resolution
5	Modeling Protein Secondary Structure by Products of Dependent Experts Cumbaa, Christian January 2001 (has links) A phenomenon as complex as protein folding requires a complex model to approximate it. This thesis presents a bottom-up approach for building complex probabilistic models of protein secondary structure by incorporating the multiple information sources which we call experts. Expert opinions are represented by probability distributions over the set of possible structures. Bayesian treatment of a group of experts results in a consensus opinion that combines the experts' probability distributions using the operators of normalized product, quotient and exponentiation. The expression of this consensus opinion simplifiesto a product of the expert opinions with two assumptions: (1) balanced training of experts, i. e. , uniform prior probability over all structures, and (2) conditional independence between expert opinions,given the structure. This research also studies how Markov chains and hidden Markov models may be used to represent expert opinion. Closure properties areproven, and construction algorithms are given for product of hidden Markov models, and product, quotient and exponentiation of Markovchains. Algorithms for extracting single-structure predictions from these models are also given. Current product-of-experts approaches in machine learning are top-down modeling strategies that assume expert independence, and require simultaneous training of all experts. This research describes a bottom-up modeling strategy that can incorporate conditionally dependent experts, and assumes separately trained experts. Computer Science probabilistic modeling protein secondary structure prediction expert resolution
6	RNA secondary sturcture prediction using a combined method of thermodynamics and kinetics Pan, Minmin 07 July 2011 (has links) Nowadays, RNA is extensively acknowledged an important role in the functions of information transfer, structural components, gene regulation and etc. The secondary structure of RNA becomes a key to understand structure-function relationship. Computational prediction of RNA secondary structure does not only provide possible structures, but also elucidates the mechanism of RNA folding. Conventional prediction programs are either derived from evolutionary perspective, or aimed to achieve minimum free energy. In vivo, RNA folds during transcription, which indicates that native RNA structure is a result from both thermodynamics and kinetics. In this thesis, I first reviewed the current leading kinetic folding programs and demonstrate that these programs are not able to predict secondary structure accurately. Upon that, I proposed a new sequential folding program called GTkinetics. Given an RNA sequence, GTkinetics predicts a secondary structure and a series of RNA folding trajectories. It treats the RNA as a growing chain, and adds stable local structures sequentially. It is featured with a Z-score to evaluate stability of local structures, which is able to locate native local structures with high confidence. Since all stable local structures are captured in GTkinetics, it results in some false positives, which prevents the native structure to form as the chain grows. This suggests a refolding model to melt the false positive hairpins, probable intermediate structures, and to fold the RNA into a new structure with reliable long-range helices. By analyzing suboptimal ensemble along the folding pathway, I suggested a refolding mechanism, with which refolding can be evaluated whether or not to take place. Another way to favor local structures over long-distance structures, we introduced a distance penalty function into the free energy calculation. I used a sigmoidal function to compute the energy penalty according to the distance in the primary sequence between two nucleotides of a base pair. For both the training dataset and the test dataset, the distance function improves the prediction to some extent. In order to characterize the differences between local and long-range helices, I carried out analysis of standardized local nucleotide composition and base pair composition according to the two groups. The results show that adenine accumulates on the 5' side of local structure, but not on that of long-range helices. GU base pairs occur significantly more frequent in the local helices than that in the long-range helices. These indicate that the mechanisms to form local and long range helices are different, which is encoded in the sequence itself. Based on all the results, I will draw conclusions and suggest future directions to enhance the current sequential folding program. MFE RNA secondary structure prediction Kinetics Molecules Genetic transcription
7	COMPUTER METHODS FOR PRE-MICRORNA SECONDARY STRUCTURE PREDICTION Han, Dianwei 01 January 2012 (has links) This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important and meaningful since many polygenic traits in animals and plants can be controlled by more than a single gene. We propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs. The trend of speedups of our parallel algorithm matches that of theoretical speedups. Conserved secondary structures are likely to be functional, and secondary structural characteristics that are shared between endogenous pre-miRNAs may contribute toward efficient biogenesis. So identifying conserved secondary structure is very meaningful and identifying conserved characteristics in RNA is a very important research field. After the characteristics are extracted from the secondary structures of RNAs, corresponding patterns or rules could be dug out and used. We propose to use the conserved microRNA characteristics in two aspects: to improve prediction through knowledge base, and to classify the real specific microRNAs from pseudo microRNAs. Through statistical analysis of the performance of classification, we verify that the conserved characteristics extracted from microRNAs’ secondary structures are precise enough. Gene suppression is a powerful tool for functional genomics and elimination of specific gene products. However, current gene suppression vectors can only be used to silence a single gene at a time. So we design an efficient poly-cistronic microRNA vector and the web-based tool allows users to design their own microRNA vectors online. Secondary structure prediction pre-microRNA data mining classification clustering Computer Sciences
8	Decision Fusion for Protein Secondary Structure Prediction Akkaladevi, Somasheker 03 August 2006 (has links) Prediction of protein secondary structure from primary sequence of amino acids is a very challenging task, and the problem has been approached from several angles. Proteins have many different biological functions; they may act as enzymes or as building blocks (muscle fibers) or may have transport function (e.g., transport of oxygen). The three-dimensional protein structure determines the functional properties of the protein. A lot of interesting work has been done on this problem, and over the last 10 to 20 years the methods have gradually improved in accuracy. In this dissertation we investigate several techniques for predicting the protein secondary structure. The prediction is carried out mainly using pattern classification techniques such as neural networks, genetic algorithms, simulated annealing. Each individual algorithm may work well in certain situations but fails in others. Capitalizing on the positive decisions can be achieved by forcing the various methods to collaborate to reach a unified consensus based on their previous performances. The process of combining classifiers is called decision fusion. The various decision fusion techniques such as the committee method, correlation method and the Bayesian inference methods to fuse the solutions from various approaches and to get better prediction accuracy are thoroughly explored in this dissertation. The RS126 data set was used for training and testing purposes. The results of applying pattern classification algorithms along with decision fusion techniques showed improvement in the prediction accuracy compared to that of prediction by neural networks or pattern classification algorithms individually or combined with neural networks. This research has shown that decision fusion techniques can be used to obtain better protein secondary structure prediction accuracy. Decision Fusion Protein Secondary Structure Prediction Pattern classification algorithms Computer Sciences
9	The Relative Importance of Input Encoding and Learning Methodology on Protein Secondary Structure Prediction Clayton, Arnshea 09 June 2006 (has links) In this thesis the relative importance of input encoding and learning algorithm on protein secondary structure prediction is explored. A novel input encoding, based on multidimensional scaling applied to a recently published amino acid substitution matrix, is developed and shown to be superior to an arbitrary input encoding. Both decimal valued and binary input encodings are compared. Two neural network learning algorithms, Resilient Propagation and Learning Vector Quantization, which have not previously been applied to the problem of protein secondary structure prediction, are examined. Input encoding is shown to have a greater impact on prediction accuracy than learning methodology with a binary input encoding providing the highest training and test set prediction accuracy. Neural Networks Learning Vector Quantization Protein Secondary Structure Prediction Resilient Propagation Computer Sciences
10	Improving secondary structure prediction with covariation analysis and structure-based alignment system of RNA sequences Shang, Lei, active 2013 10 February 2014 (has links) RNA molecules form complex higher-order structures which are essential to perform their biological activities. The accurate prediction of an RNA secondary structure and other higher-order structural constraints will significantly enhance the understanding of RNA molecules and help interpret their functions. Covariation analysis is the predominant computational method to accurately predict the base pairs in the secondary structure of RNAs. I developed a novel and powerful covariation method, Phylogenetic Events Count (PEC) method, to determine the positional covariation. The application of the PEC method onto a bacterial 16S rRNA sequence alignment proves that it is more sensitive and accurate than other mutual information based method in the identification of base-pairs and other structural constraints of the RNA structure. The analysis also discoveries a new type of structural constraint – neighbor effect, between sets of nucleotides that are in proximity in the three dimensional RNA structure with weaker but significant covariation with one another. Utilizing these covariation methods, a proposed secondary structure model of an entire HIV-1 genome RNA is evaluated. The results reveal that vast majority of the predicted base pairs in the proposed HIV-1 secondary structure model do not have covariation, thus lack the support from comparative analysis. Generating the most accurate multiple sequence alignment is fundamental and essential of performing high-quality comparative analysis. The rapid determination of nucleic acid sequences dramatically increases the number of available sequences. Thus developing the accurate and rapid alignment program for these RNA sequences has become a vital and challenging task to decipher the maximum amount of information from the data. A template-based RNA sequence alignment system, CRWAlign-2, is developed to accurately align new sequences to an existing reference sequence alignment based on primary and secondary structural similarity. A comparison of CRWAlign-2 with eight alternative widely-used alignment programs reveals that CRWAlign-2 outperforms other programs in aligning new sequences with higher accuracy. In addition to aligning sequences accurately, CRWAlign-2 also creates secondary structure models for each sequence to be aligned, which provides very useful information for the comparative analysis of RNA sequences and structures. The CRWAlign-2 program also provides opportunities for multiple areas including the identification of chimeric 16S rRNA sequences generated in microbiome sequencing projects. / text PEC Comparative analysis Covariation analysis Secondary structure prediction Sequence alignment Structure-based alignment CRWAlign2 HIV

Search results