Global ETD Search

41	COMPUTER METHODS FOR PRE-MICRORNA SECONDARY STRUCTURE PREDICTION Han, Dianwei 01 January 2012 (has links) This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important and meaningful since many polygenic traits in animals and plants can be controlled by more than a single gene. We propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs. The trend of speedups of our parallel algorithm matches that of theoretical speedups. Conserved secondary structures are likely to be functional, and secondary structural characteristics that are shared between endogenous pre-miRNAs may contribute toward efficient biogenesis. So identifying conserved secondary structure is very meaningful and identifying conserved characteristics in RNA is a very important research field. After the characteristics are extracted from the secondary structures of RNAs, corresponding patterns or rules could be dug out and used. We propose to use the conserved microRNA characteristics in two aspects: to improve prediction through knowledge base, and to classify the real specific microRNAs from pseudo microRNAs. Through statistical analysis of the performance of classification, we verify that the conserved characteristics extracted from microRNAs’ secondary structures are precise enough. Gene suppression is a powerful tool for functional genomics and elimination of specific gene products. However, current gene suppression vectors can only be used to silence a single gene at a time. So we design an efficient poly-cistronic microRNA vector and the web-based tool allows users to design their own microRNA vectors online. Secondary structure prediction pre-microRNA data mining classification clustering Computer Sciences
42	From Sequence to Structure : Using predicted residue contacts to facilitate template-free protein structure prediction Michel, Mirco January 2017 (has links) Despite the fundamental role of experimental protein structure determination, computational methods are of essential importance to bridge the ever growing gap between available protein sequence and structure data. Common structure prediction methods rely on experimental data, which is not available for about half of the known protein families. Recent advancements in amino acid contact prediction have revolutionized the field of protein structure prediction. Contacts can be used to guide template-free structure predictions that do not rely on experimentally solved structures of homologous proteins. Such methods are now able to produce accurate models for a wide range of protein families. We developed PconsC2, an approach that improved existing contact prediction methods by recognizing intra-molecular contact patterns and noise reduction. An inherent problem of contact prediction based on maximum entropy models is that large alignments with over 1000 effective sequences are needed to infer contacts accurately. These are however not available for more than 80% of all protein families that do not have a representative structure in PDB. With PconsC3, we could extend the applicability of contact prediction to families as small as 100 effective sequences by combining global inference methods with machine learning based on local pairwise measures. By introducing PconsFold, a pipeline for contact-based structure prediction, we could show that improvements in contact prediction accuracy translate to more accurate models. Finally, we applied a similar technique to Pfam, a comprehensive database of known protein families. In addition to using a faster folding protocol we employed model quality assessment methods, crucial for estimating the confidence in the accuracy of predicted models. We propose models tobe accurate for 558 families that do not have a representative known structure. Out of those, over 75% have not been reported before. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 2: Submitted. Paper 4: In press.</p><p> </p> protein bioinformatics protein structure prediction contact prediction machine learning Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
43	Structure Determination and Prediction of Zeolites : A Combined Study by Electron Diffraction, Powder X-Ray Diffraction and Database Mining Guo, Peng January 2016 (has links) Zeolites are crystalline microporous aluminosilicates with well-defined cavities or channels of molecular dimensions. They are widely used for applications such as gas adsorption, gas storage, ion exchange and catalysis. The size of the pore opening allows zeolites to be categorized into small, medium, large and extra-large pore zeolites. A typical zeolite is the small pore silicoaluminophosphate SAPO-34, which is an important catalyst in the MTO (methanol-to-olefin) process. The properties of zeolite catalysts are determined mainly by their structures, and it is therefore important to know the structures of these materials in order to understand their properties and explore new applications. Single crystal X-ray diffraction has been the main technique used to determine the structures of unknown crystalline materials such as zeolites. This technique, however, can be used only if crystals larger than several micrometres are available. Powder X-ray diffraction (PXRD) is an alternative technique to determine the structures if only small crystals are available. However, peak overlap, poor crystallinity and the presence of impurities hinder the solution of structures from PXRD data. Electron crystallography can overcome these problems. We have developed a new method, which we have called “rotation electron diffraction” (RED), for the automated collection and processing of three-dimensional electron diffraction data. This thesis describes how the RED method has been applied to determine the structures of several zeolites and zeolite-related materials. These include two interlayer expanded silicates (COE-3 and COE-4), a new layered zeolitic fluoroaluminophosphate (EMM-9), a new borosilicate (EMM-26), and an aluminosilicate (ZSM-25). We have developed a new approach based on strong reflections, and used it to determine the structure of ZSM-25, and to predict the structures of a series of complex zeolites in the RHO family. We propose a new structural principle that describes a series of structurally related zeolites known as “embedded isoreticular zeolite structures”, which have expanding unit cells. The thesis also summarizes several common structural features of zeolites in the Database of Zeolite Structures. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 2: Manuscript. Paper 3: Manuscript.</p> zeolites rotation electron diffraction structure determination structure prediction strong reflections approach
44	Função de avaliação dinâmica em algoritmos genéticos aplicados na predição de estruturas tridimensionais de proteínas / Genetic Algorithms with Dynamic Fitness Functions Applied to Tridimensional Protein Structure Prediction Luís Henrique Uchida Ishivatari 28 September 2012 (has links) O problema de predição de estruturas tridimensionais de proteínas pode ser visto computacionalmente como um problema de otimização, tal que dada a sequência de aminoácidos, deve-se encontrar a estrutura tridimensional da proteína dentre as muitas possíveis através da obtenção de mínimos de funções de energia. Vários pesquisadores têm proposto estratégias de Computação Evolutiva para a determinação de estruturas tridimensionais das proteínas, entretanto nem sempre resultados animadores têm sido alcançados visto que entre outros fatores, há um grande número de ótimos locais no espaço de busca. Geralmente as funções de fitness empregadas pelos algoritmos de otimização são baseadas em campos de força com diferentes termos de energia, sendo que os parâmetros destes termos são ajustados a priori e são mantidos estáticos ao longo do processo de otimização. Alguns pesquisadores sugerem que o uso de funções de fitness dinâmicas, ou seja, que mudam durante um processo de otimização evolutivo, pode aumentar a capacidade das populações fugirem de ótimos locais em problemas altamente multimodais. Neste trabalho, propõe-se que os parâmetros dos termos do campo de força utilizado sejam modificados durante o processo de otimização realizado por Algoritmos Genéticos (AGs) no problema de predição de estruturas de proteínas, sendo aumentados ou diminuídos, por exemplo, de acordo com a sua influência na formação de estruturas secundárias e no seu ajuste fino. Como a função de avaliação será modificada durante o processo de otimização, a predição de estruturas tridimensionais de proteínas torna-se um problema de otimização dinâmica, sendo que o uso de Algoritmos Genéticos específicos para tais problemas, como o AG com hipermutação e os AGs com imigrantes aleatórios são investigados aqui. É proposta uma nova métrica relacionada ao alinhamento da estrutura secundária da proteína, para auxiliar a análise dos dados obtidos e os resultados dos experimentos indicam que os algoritmos com função de avaliação dinâmica obtiveram resultados melhores que os algoritmos estáticos, o que é explicado pelo fato de as mudanças na função de fitness possibilitarem eventuais fugas de ótimos locais, bem como um aumento da diversidade da população. / The protein structure prediction can be seen as an optimization problem where given an amino acid sequence, the tertiary protein structure must be found amongst many possible by obtaining energy functions minima. Many researchers have been proposing Evolutionary Computation strategies to find tridimensional structures of proteins; however results are not always satisfactory since among other factors, there are always a great number of local optima in the search space. Usually, the fitness functions used by optimization algorithms are based on force fields with different energy terms with parameters from those terms being adjusted a priori, kept static through the optimization process. Some researchers suggest that the use of dynamic functions, i.e., that can be changed during the evolutionary process, can help the population to escape from local optima in highly multimodal problems. In this work we propose that the force field parameters can be changed during the optimization process of Genetic Algorithms (GAs) in the protein structure prediction problem, being increased or decreased, for instance, according with its influence on formation of secondary structures and its fine tuning. Since the cost function will be changed during the optimization process, the protein tridimensional structure prediction becomes a dynamic optimization problem and specific Genetic Algorithms for this kind of problem, like the hypermutation GA and random immigrants GA are investigated. We also propose a new metric related to the proteins secondary structure alignment to help the analysis of obtained data. Results indicate that the dynamic function algorithms obtained better results than static algorithms since changes on the fitness function allow the population to escape local optima, as well as an increase on the population diversity. algoritmos genéticos predição de estruturas de proteínas proteínas genetic algorithms protein protein structure prediction
45	Função de avaliação dinâmica em algoritmos genéticos aplicados na predição de estruturas tridimensionais de proteínas / Genetic Algorithms with Dynamic Fitness Functions Applied to Tridimensional Protein Structure Prediction Ishivatari, Luís Henrique Uchida 28 September 2012 (has links) O problema de predição de estruturas tridimensionais de proteínas pode ser visto computacionalmente como um problema de otimização, tal que dada a sequência de aminoácidos, deve-se encontrar a estrutura tridimensional da proteína dentre as muitas possíveis através da obtenção de mínimos de funções de energia. Vários pesquisadores têm proposto estratégias de Computação Evolutiva para a determinação de estruturas tridimensionais das proteínas, entretanto nem sempre resultados animadores têm sido alcançados visto que entre outros fatores, há um grande número de ótimos locais no espaço de busca. Geralmente as funções de fitness empregadas pelos algoritmos de otimização são baseadas em campos de força com diferentes termos de energia, sendo que os parâmetros destes termos são ajustados a priori e são mantidos estáticos ao longo do processo de otimização. Alguns pesquisadores sugerem que o uso de funções de fitness dinâmicas, ou seja, que mudam durante um processo de otimização evolutivo, pode aumentar a capacidade das populações fugirem de ótimos locais em problemas altamente multimodais. Neste trabalho, propõe-se que os parâmetros dos termos do campo de força utilizado sejam modificados durante o processo de otimização realizado por Algoritmos Genéticos (AGs) no problema de predição de estruturas de proteínas, sendo aumentados ou diminuídos, por exemplo, de acordo com a sua influência na formação de estruturas secundárias e no seu ajuste fino. Como a função de avaliação será modificada durante o processo de otimização, a predição de estruturas tridimensionais de proteínas torna-se um problema de otimização dinâmica, sendo que o uso de Algoritmos Genéticos específicos para tais problemas, como o AG com hipermutação e os AGs com imigrantes aleatórios são investigados aqui. É proposta uma nova métrica relacionada ao alinhamento da estrutura secundária da proteína, para auxiliar a análise dos dados obtidos e os resultados dos experimentos indicam que os algoritmos com função de avaliação dinâmica obtiveram resultados melhores que os algoritmos estáticos, o que é explicado pelo fato de as mudanças na função de fitness possibilitarem eventuais fugas de ótimos locais, bem como um aumento da diversidade da população. / The protein structure prediction can be seen as an optimization problem where given an amino acid sequence, the tertiary protein structure must be found amongst many possible by obtaining energy functions minima. Many researchers have been proposing Evolutionary Computation strategies to find tridimensional structures of proteins; however results are not always satisfactory since among other factors, there are always a great number of local optima in the search space. Usually, the fitness functions used by optimization algorithms are based on force fields with different energy terms with parameters from those terms being adjusted a priori, kept static through the optimization process. Some researchers suggest that the use of dynamic functions, i.e., that can be changed during the evolutionary process, can help the population to escape from local optima in highly multimodal problems. In this work we propose that the force field parameters can be changed during the optimization process of Genetic Algorithms (GAs) in the protein structure prediction problem, being increased or decreased, for instance, according with its influence on formation of secondary structures and its fine tuning. Since the cost function will be changed during the optimization process, the protein tridimensional structure prediction becomes a dynamic optimization problem and specific Genetic Algorithms for this kind of problem, like the hypermutation GA and random immigrants GA are investigated. We also propose a new metric related to the proteins secondary structure alignment to help the analysis of obtained data. Results indicate that the dynamic function algorithms obtained better results than static algorithms since changes on the fitness function allow the population to escape local optima, as well as an increase on the population diversity. algoritmos genéticos genetic algorithms predição de estruturas de proteínas protein protein structure prediction proteínas
46	Formulation of Hybrid Knowledge-Based/Molecular Mechanics Potentials for Protein Structure Refinement and a Novel Graph Theoretical Protein Structure Comparison and Analysis Technique Maus, Aaron 05 August 2019 (has links) Proteins are the fundamental machinery that enables the functions of life. It is critical to understand them not just for basic biology, but also to enable medical advances. The field of protein structure prediction is concerned with developing computational techniques to predict protein structure and function from a protein’s amino acid sequence, encoded for directly in DNA, alone. Despite much progress since the first computational models in the late 1960’s, techniques for the prediction of protein structure still cannot reliably produce structures of high enough accuracy to enable desired applications such as rational drug design. Protein structure refinement is the process of modifying a predicted model of a protein to bring it closer to its native state. In this dissertation a protein structure refinement technique, that of potential energy minimization using hybrid molecular mechanics/knowledge based potential energy functions is examined in detail. The generation of the knowledge-based component is critically analyzed, and in the end, a potential that is a modest improvement over the original is presented. This dissertation also examines the task of protein structure comparison. In evaluating various protein structure prediction techniques, it is crucial to be able to compare produced models against known structures to understand how well the technique performs. A novel technique is proposed that allows an in-depth yet intuitive evaluation of the local similarities between protein structures. Based on a graph analysis of pairwise atomic distance similarities, multiple regions of structural similarity can be identified between structures independently of relative orientation. Multidomain structures can be evaluated and this technique can be combined with global measures of similarity such as the global distance test. This method of comparison is expected to have broad applications in rational drug design, the evolutionary study of protein structures, and in the analysis of the protein structure prediction effort. Bioinformatics Protein Structure Prediction Protein Structure Refinement Statistical Energy Functions Protein Structure Comparison Graph Analysis Bioinformatics
47	Identification and classification of ncRNA molecules using graph properties Childs, Liam, Nikoloski, Zoran, May, Patrick, Walther, Dirk January 2009 (has links) The study of non-coding RNA genes has received increased attention in recent years fuelled by accumulating evidence that larger portions of genomes than previously acknowledged are transcribed into RNA molecules of mostly unknown function, as well as the discovery of novel non-coding RNA types and functional RNA elements. Here, we demonstrate that specific properties of graphs that represent the predicted RNA secondary structure reflect functional information. We introduce a computational algorithm and an associated web-based tool (GraPPLE) for classifying non-coding RNA molecules as functional and, furthermore, into Rfam families based on their graph properties. Unlike sequence-similarity-based methods and covariance models, GraPPLE is demonstrated to be more robust with regard to increasing sequence divergence, and when combined with existing methods, leads to a significant improvement of prediction accuracy. Furthermore, graph properties identified as most informative are shown to provide an understanding as to what particular structural features render RNA molecules functional. Thus, GraPPLE may offer a valuable computational filtering tool to identify potentially interesting RNA molecules among large candidate datasets. RNA secondary structure Noncoding RNAs Structure prediction Gene-expression Structured RNAs Life sciences
48	Decision Fusion for Protein Secondary Structure Prediction Akkaladevi, Somasheker 03 August 2006 (has links) Prediction of protein secondary structure from primary sequence of amino acids is a very challenging task, and the problem has been approached from several angles. Proteins have many different biological functions; they may act as enzymes or as building blocks (muscle fibers) or may have transport function (e.g., transport of oxygen). The three-dimensional protein structure determines the functional properties of the protein. A lot of interesting work has been done on this problem, and over the last 10 to 20 years the methods have gradually improved in accuracy. In this dissertation we investigate several techniques for predicting the protein secondary structure. The prediction is carried out mainly using pattern classification techniques such as neural networks, genetic algorithms, simulated annealing. Each individual algorithm may work well in certain situations but fails in others. Capitalizing on the positive decisions can be achieved by forcing the various methods to collaborate to reach a unified consensus based on their previous performances. The process of combining classifiers is called decision fusion. The various decision fusion techniques such as the committee method, correlation method and the Bayesian inference methods to fuse the solutions from various approaches and to get better prediction accuracy are thoroughly explored in this dissertation. The RS126 data set was used for training and testing purposes. The results of applying pattern classification algorithms along with decision fusion techniques showed improvement in the prediction accuracy compared to that of prediction by neural networks or pattern classification algorithms individually or combined with neural networks. This research has shown that decision fusion techniques can be used to obtain better protein secondary structure prediction accuracy. Decision Fusion Protein Secondary Structure Prediction Pattern classification algorithms Computer Sciences
49	Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure Altun, Gulsah 22 April 2008 (has links) Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers. protein structure prediction feature selection support vector machines graph theory machine learning algorithm Computer Sciences
50	The Relative Importance of Input Encoding and Learning Methodology on Protein Secondary Structure Prediction Clayton, Arnshea 09 June 2006 (has links) In this thesis the relative importance of input encoding and learning algorithm on protein secondary structure prediction is explored. A novel input encoding, based on multidimensional scaling applied to a recently published amino acid substitution matrix, is developed and shown to be superior to an arbitrary input encoding. Both decimal valued and binary input encodings are compared. Two neural network learning algorithms, Resilient Propagation and Learning Vector Quantization, which have not previously been applied to the problem of protein secondary structure prediction, are examined. Input encoding is shown to have a greater impact on prediction accuracy than learning methodology with a binary input encoding providing the highest training and test set prediction accuracy. Neural Networks Learning Vector Quantization Protein Secondary Structure Prediction Resilient Propagation Computer Sciences

Search results