121 |
Simulations numériques de la dynamique des protéines : translation de ligands, flexibilité et dynamique des bouclesSt-Pierre, Jean-François 03 1900 (has links)
La flexibilité est une caractéristique intrinsèque des protéines qui doivent, dès le mo- ment de leur synthèse, passer d’un état de chaîne linéaire à un état de structure tridimen- sionnelle repliée et enzymatiquement active. Certaines protéines restent flexibles une fois repliées et subissent des changements de conformation de grande amplitude lors de leur cycle enzymatique. D’autres contiennent des segments si flexibles que leur structure ne peut être résolue par des méthodes expérimentales. Dans cette thèse, nous présentons notre application de méthodes in silico d’analyse de la flexibilité des protéines :
• À l’aide des méthodes de dynamique moléculaire dirigée et d’échantillonnage pa- rapluie, nous avons caractérisé les trajectoires de liaison de l’inhibiteur Z-pro- prolinal à la protéine Prolyl oligopeptidase et identifié la trajectoire la plus pro- bable. Nos simulations ont aussi identifié un mode probable de recrutement des ligands utilisant une boucle flexible de 19 acides aminés à l’interface des deux domaines de la protéine.
• En utilisant les méthodes de dynamique moléculaire traditionnelle et dirigée, nous avons examiné la stabilité de la protéine SAV1866 dans sa forme fermée insérée dans une membrane lipidique et étudié un des modes d’ouverture possibles par la séparation de ses domaines liant le nucléotide.
• Nous avons adapté auproblème de la prédiction de la structure des longues boucles flexibles la méthode d’activation et de relaxation ART-nouveau précédemment uti- lisée dans l’étude du repliement et de l’agrégation de protéines. Appliqué au replie- ment de boucles de 8 à 20 acides aminés, la méthode démontre une dépendance quadratique du temps d’exécution sur la longueur des boucles, rendant possible l’étude de boucles encore plus longues. / Flexibility is an intrinsic characteristic of proteins who from the moment of synthesis into a linear chain of amino acids, have to adopt an enzymatically active tridimensionnel structure. Some proteins stay flexible once folded and display large amplitude confor- mational changes during their enzymatic cycles. Others contain parts that are so flexible that their structure can’t be resolved using experimental methods. In this thesis, we present our application of in silico methods to the study of protein flexibility.
• Using steered molecular dynamics and umbrella sampling, we characterized the binding trajectories of the Z-pro-prolinal inhibiter to the Prolyl oligopeptidase pro- tein and we identified the most probable trajectory. Our simulations also found a possible ligand recrutement mechanism that involves a 19 amino acids flexible loop at the interface of the two domains of the protein.
• Using traditional and steered molecular dynamics, we examined the stability of the SAV1866 protein in its closed conformation in a lipid membrane and we studied one of its proposed opening modes by separating its nucleotide binding domains.
• We also adapted the activation-relaxation technique ART-nouveau which was pre- viously used to study protein folding and aggregation to the problem of structure prediction of large flexible loops. When tested on loops of 8 to 20 amino acids, the method demonstrate a quadratic execution time dependance on the loop length, which makes it possible to use the method on even larger loops.
|
122 |
Bayesian models and algoritms for protein secondary structure and beta-sheet predictionAydin, Zafer 17 September 2008 (has links)
In this thesis, we developed Bayesian models and machine learning algorithms for protein secondary structure and beta-sheet prediction problems. In protein secondary structure prediction, we developed hidden semi-Markov models, N-best algorithms and training set reduction procedures for proteins in the single-sequence category. We introduced three residue dependency models (both probabilistic and heuristic) incorporating the statistically significant amino acid correlation patterns at structural segment borders. We allowed dependencies to positions outside the segments to relax the condition of segment independence. Another novelty of the models is the dependency to downstream positions, which is important due to asymmetric correlation patterns observed uniformly in structural segments. Among the dataset reduction methods, we showed that the composition based reduction generated the most accurate results. To incorporate non-local interactions characteristic of beta-sheets, we developed two N-best algorithms and a Bayesian beta-sheet model. In beta-sheet prediction, we developed a Bayesian model to characterize the conformational organization of beta-sheets and efficient algorithms to compute the optimum architecture, which includes beta-strand pairings, interaction types (parallel or anti-parallel) and residue-residue interactions (contact maps). We introduced a Bayesian model for proteins with six or less beta-strands, in which we model the conformational features in a probabilistic framework by combining the amino acid pairing potentials with a priori knowledge of beta-strand organizations. To select the optimum beta-sheet architecture, we analyzed the space of possible conformations by efficient heuristics, in which we significantly reduce the search space by enforcing the amino acid pairs that have strong interaction potentials. For proteins with more than six beta-strands, we first computed beta-strand pairings using the BetaPro method. Then, we computed gapped alignments of the paired beta-strands in parallel and anti-parallel directions and chose the interaction types and beta-residue pairings with maximum alignment scores. Accurate prediction of secondary structure, beta-sheets and non-local contacts should improve the accuracy and quality of the three-dimensional structure prediction.
|
123 |
Theoretical studies of compressed xenon oxides, tin selenide thermoelectrics, and defects in grapheneWorth, Nicholas Gower January 2018 (has links)
Enormous advances in computing power in recent decades have made it possible to perform accurate numerical simulations of a wide range of systems in condensed matter physics. At the forefront of this progress has been density functional theory (DFT), a very popular approach to tackling the complexity of quantum-mechanical systems that very often strikes a good balance between accuracy and tractability in light of the finite computational resources available to researchers. This thesis describes work utilising DFT methods to tackle two distinct problems. Firstly, the theoretical prediction of stable and metastable periodic structures under specified conditions using the ab initio random structure searching (AIRSS) method, which involves a large scale exploration of the Born-Oppenheimer energy surface, and secondly the use of a vibrational self-consistent field (VSCF) approach to investigate the effects of nuclear motion and anharmonicity in crystal systems, which involves a local exploration of the Born-Oppenheimer energy surface. The AIRSS crystal structure prediction method is here applied to a study of defect structures in graphene. It is also applied to a study of the xenon-oxygen binary system under a range of geological pressures (83–200 GPa). Novel xenon oxide structures are predicted and characterised theoretically. This work was carried out in collaboration with an experimental study of the system at the lower end of the pressure range. The VSCF approach to investigating anharmonicity is here applied to the study of tin selenide (SnSe), a material that has recently been shown to demonstrate consider- able promise as a thermoelectric material. In this thesis, the effects of the anharmonic nuclear motion on the vibrational and electronic properties of SnSe are investigated quantitatively.
|
124 |
MOIRAE : a computational strategy to predict 3-D structures of polypeptidesDorn, Márcio January 2012 (has links)
Currently, one of the main research problems in Structural Bioinformatics is associated to the study and prediction of the 3-D structure of proteins. The 1990’s GENOME projects resulted in a large increase in the number of protein sequences. However, the number of identified 3-D protein structures have not followed the same growth trend. The number of protein sequences is much higher than the number of known 3-D structures. Many computational methodologies, systems and algorithms have been proposed to address the protein structure prediction problem. However, the problem still remains challenging because of the complexity and high dimensionality of a protein conformational search space. This work presents a new computational strategy for the 3-D protein structure prediction problem. A first principle strategy which uses database information for the prediction of the 3-D structure of polypeptides was developed. The proposed technique manipulates structural information from the PDB in order to generate torsion angles intervals. Torsion angles intervals are used as input to a genetic algorithm with a local-search operator in order to search the protein conformational space and predict its 3-D structure. Results show that the 3-D structures obtained by the proposed method were topologically comparable to their correspondent experimental structure.
|
125 |
MOIRAE : a computational strategy to predict 3-D structures of polypeptidesDorn, Márcio January 2012 (has links)
Currently, one of the main research problems in Structural Bioinformatics is associated to the study and prediction of the 3-D structure of proteins. The 1990’s GENOME projects resulted in a large increase in the number of protein sequences. However, the number of identified 3-D protein structures have not followed the same growth trend. The number of protein sequences is much higher than the number of known 3-D structures. Many computational methodologies, systems and algorithms have been proposed to address the protein structure prediction problem. However, the problem still remains challenging because of the complexity and high dimensionality of a protein conformational search space. This work presents a new computational strategy for the 3-D protein structure prediction problem. A first principle strategy which uses database information for the prediction of the 3-D structure of polypeptides was developed. The proposed technique manipulates structural information from the PDB in order to generate torsion angles intervals. Torsion angles intervals are used as input to a genetic algorithm with a local-search operator in order to search the protein conformational space and predict its 3-D structure. Results show that the 3-D structures obtained by the proposed method were topologically comparable to their correspondent experimental structure.
|
126 |
Algoritmos evolutivos para predição de estruturas de proteínas / Evolutionary algorithms, to proteins structures predictionTelma Woerle de Lima 01 September 2006 (has links)
A Determinação da Estrutura tridimensional de Proteínas (DEP) a partir da sua seqüência de aminoácidos é importante para a engenharia de proteínas e o desenvolvimento de novos fármacos. Uma alternativa para este problema tem sido a aplicação de técnicas de computação evolutiva. As abordagens utilizando Algoritmos Evolutivos (AEs) tem obtido resultados relevantes, porém estão restritas a pequenas proteínas, com dezenas de aminoácidos e a algumas classes de proteínas. Este trabalho propõe a investigação de uma abordagem utilizando AEs para a predição da estrutura terciária de proteínas independentemente do seu tamanho e classe. Os resultados obtidos demonstram que apesar das dificuldades encontradas a abordagem investigada constitue-se em uma alternativa em relação aos métodos clássicos de determinação da estrutura terciária das proteínas. / Protein structure determination (DEP) from aminoacid sequences is very importante to protein engineering and development of new drugs. Evolutionary computation has been aplied to this problem with relevant results. Nevertheless, Evolutionary Algorithms (EAs) can work with only proteins with few aminoacids and some protein classes. This work proposes an approach using AEs to predict protein tertiary structure independly from their size and class. The obtained results show that, despite of the difficulties that have been found, the investigate approach is a relevant alternative to classical methods to protein structure determination.
|
127 |
MDAPSP - Uma arquitetura modular distribuída para auxílio à predição de estruturas de proteínas / MDAPSP - A modular distributed architecture to support the protein structure predictionEdvard Martins de Oliveira 09 May 2018 (has links)
A predição de estruturas de proteínas é um campo de pesquisa que busca simular o enovelamento de cadeias de aminoácidos de forma a descobrir as funções das proteínas na natureza, um processo altamente dispendioso por meio de métodos in vivo. Inserida no contexto da Bioinformática, é uma das tarefas mais computacionalmente custosas e desafiadoras da atualidade. Devido à complexidade, muitas pesquisas se utilizam de gateways científicos para disponibilização de ferramentas de execução e análise desses experimentos, aliado ao uso de workflows científicos para organização de tarefas e disponibilização de informações. No entanto, esses gateways podem enfrentar gargalos de desempenho e falhas estruturais, produzindo resultados de baixa qualidade. Para atuar nesse contexto multifacetado e oferecer alternativas para algumas das limitações, esta tese propõe uma arquitetura modular baseada nos conceitos de Service Oriented Architecture (SOA) para oferta de recursos computacionais em gateways científicos, com foco nos experimentos de Protein Structure Prediction (PSP). A Arquitetura Modular Distribuída para auxílio à Predição de Estruturas de Proteínas (MDAPSP) é descrita conceitualmente e validada em um modelo de simulação computacional, no qual se pode identificar suas capacidades, detalhar o funcionamento de seus módulos e destacar seu potencial. A avaliação experimental demonstra a qualidade dos algoritmos propostos, ampliando a capacidade de atendimento de um gateway científico, reduzindo o tempo necessário para experimentos de predição e lançando as bases para o protótipo de uma arquitetura funcional. Os módulos desenvolvidos alcançam boa capacidade de otimização de experimentos de PSP em ambientes distribuídos e constituem uma novidade no modelo de provisionamento de recursos para gateways científicos. / PSP is a scientific process that simulates the folding of amino acid chains to discover the function of a protein in live organisms, considering that its an expensive process to be done by in vivo methods. PSP is a computationally demanding and challenging effort in the Bioinformatics stateof- the-art. Many works use scientific gateways to provide tools for execution and analysis of such experiments, along with scientific workflows to organize tasks and to share information. However, these gateways can suffer performance bottlenecks and structural failures, producing low quality results. With the goal of offering alternatives to some of the limitations and considering the complexity of the topics involved, this thesis proposes a modular architecture based on SOA concepts to provide computing resources to scientific gateways, with focus on PSP experiments. The Modular Distributed Architecture to support Protein Structure Prediction (MDAPSP) is described conceptually and validated in a computer simulation model that explain its capabilities, detail the modules operation and highlight its potential. The performance evaluation presents the quality of the proposed algorithms, a reduction of response time in PSP experiments and prove the benefits of the novel algorithms, establishing the basis for a prototype. The new modules can optmize the PSP experiments in distributed environments and are a innovation in the resource provisioning model for scientific gateways.
|
128 |
Méthodes pour l'inférence en grande dimension avec des données corrélées : application à des données génomiques / Methods for staistical inference on correlated data : application to genomic dataLeonardis, Eleonora De 26 October 2015 (has links)
La disponibilité de quantités énormes de données a changé le rôle de la physique par rapport aux autres disciplines. Dans cette thèse, je vais explorer les innovations introduites dans la biologie moléculaire grâce à des approches de physique statistique. Au cours des 20 dernières années, la taille des bases de données sur le génome a augmenté de façon exponentielle : l'exploitation des données brutes, dans le champ d'application de l'extraction d'informations, est donc devenu un sujet majeur dans la physique statistique. Après le succès dans la prédiction de la structure des protéines, des résultats étonnamment bons ont été finalement obtenus aussi pour l'ARN. Cependant, des études récentes ont révélé que, même si les bases de données sont de plus en plus grandes, l'inférence est souvent effectuée dans le régime de sous-échantillonnage et de nouveaux systèmes informatiques sont nécessaires afin de surmonter cette limitation intrinsèque des données réelles. Cette thèse va discuter des méthodes d'inférence et leur application à des prédictions de la structure de l'ARN. Nous allons comprendre certaines approches heuristiques qui ont été appliquées avec succès dans les dernières années, même si théoriquement mal comprises. La dernière partie du travail se concentrera sur le développement d'un outil pour l'inférence de modèles génératifs, en espérant qu'il ouvrira la voie à de nouvelles applications. / The availability of huge amounts of data has changed the role of physics with respect to other disciplines. Within this dissertation I will explore the innovations introduced in molecular biology thanks to statistical physics approaches. In the last 20 years the size of genome databases has exponentially increased, therefore the exploitation of raw data, in the scope of extracting information, has become a major topic in statistical physics. After the success in protein structure prediction, surprising results have been finally achieved also in the related field of RNA structure characterisation. However, recent studies have revealed that, even if databases are growing, inference is often performed in the under sampling regime and new computational schemes are needed in order to overcome this intrinsic limitation of real data. This dissertation will discuss inference methods and their application to RNA structure prediction. We will discuss some heuristic approaches that have been successfully applied in the past years, even if poorly theoretically understood. The last part of the work will focus on the development of a tool for the inference of generative models, hoping it will pave the way towards novel applications.
|
129 |
Algoritmos de estimação de distribuição para predição ab initio de estruturas de proteínas / Estimation of distribution algorithms for ab initio protein structure predictionDaniel Rodrigo Ferraz Bonetti 05 March 2015 (has links)
As proteínas são moléculas que desempenham funções essenciais para a vida. Para entender a função de uma proteína é preciso conhecer sua estrutura tridimensional. No entanto, encontrar a estrutura da proteína pode ser um processo caro e demorado, exigindo profissionais altamente qualificados. Neste sentido, métodos computacionais têm sido investigados buscando predizer a estrutura de uma proteína a partir de uma sequência de aminoácidos. Em geral, tais métodos computacionais utilizam conhecimentos de estruturas de proteínas já determinadas por métodos experimentais, para tentar predizer proteínas com estrutura desconhecida. Embora métodos computacionais como, por exemplo, o Rosetta, I-Tasser e Quark tenham apresentado sucesso em suas predições, são apenas capazes de produzir estruturas significativamente semelhantes às já determinadas experimentalmente. Com isso, por utilizarem conhecimento a priori de outras estruturas pode haver certa tendência em suas predições. Buscando elaborar um algoritmo eficiente para Predição de Estruturas de Proteínas livre de tendência foi desenvolvido um Algoritmo de Estimação de Distribuição (EDA) específico para esse problema, com modelagens full-atom e algoritmos ab initio. O fato do algoritmo proposto ser ab initio é mais interessante para aplicação envolvendo proteínas com baixa similaridade, com relação às estruturas já conhecidas. Três tipos de modelos probabilísticos foram desenvolvidos: univariado, bivariado e hierárquico. O univariado trata o aspecto de multi-modalidade de uma variável, o bivariado trata os ângulos diedrais (Φ Ψ) de um mesmo aminoácido como variáveis correlacionadas. O hierárquico divide o problema em subproblemas e tenta tratá-los separadamente. Os resultados desta pesquisa mostraram que é possível obter melhores resultados quando considerado a relação bivariada (Φ Ψ). O hierárquico também mostrou melhorias nos resultados obtidos, principalmente para proteínas com mais de 50 resíduos. Além disso, foi realiza uma comparação com algumas heurísticas da literatura, como: Busca Aleatória, Monte Carlo, Algoritmo Genético e Evolução Diferencial. Os resultados mostraram que mesmo uma metaheurística pouco eficiente, como a Busca Aleatória, pode encontrar a solução correta, porém utilizando muito conhecimento a priori (predição que pode ser tendenciosa). Por outro lado, o algoritmo proposto neste trabalho foi capaz de obter a estrutura da proteína esperada sem utilizar conhecimento a priori, caracterizando uma predição puramente ab initio (livre de tendência). / Proteins are molecules that perform critical roles in the living organism and they are essential for their lifes. To understand the function of a protein, its 3D structure should be known. However, to find the protein structure is an expensive and a time-consuming task, requiring highly skilled professionals. Aiming to overcome such a limitation, computational methods for Protein Structure Prediction (PSP) have been investigated, in order to predict the protein structure from its amino acid sequence. Most of computational methods require knowledge from already determined structures from experimental methods in order to predict an unknown protein. Although computational methods such as Rosetta, I-Tasser and Quark have showed success in their predictions, they are only capable to predict quite similar structures to already known proteins obtained experimentally. The use of such a prior knowledge in the predictions of Rosetta, I-Tasser and Quark may lead to biased predictions. In order to develop a computational algorithm for PSP free of bias, we developed an Estimation of Distribution Algorithm applied to PSP with full-atom and ab initio model. A computational algorithm with ab initio model is mainly interesting when dealing with proteins with low similarity with the known proteins. In this work, we developed an Estimation of Distribution Algorithm with three probabilistic models: univariate, bivariate and hierarchical. The univariate deals with multi-modality of the distribution of the data of a single variable. The bivariate treats the dihedral angles (Proteins are molecules that perform critical roles in the living organism and they are essential for their lifes. To understand the function of a protein, its 3D structure should be known. However, to find the protein structure is an expensive and a time-consuming task, requiring highly skilled professionals. Aiming to overcome such a limitation, computational methods for Protein Structure Prediction (PSP) have been investigated, in order to predict the protein structure from its amino acid sequence. Most of computational methods require knowledge from already determined structures from experimental methods in order to predict an unknown protein. Although computational methods such as Rosetta, I-Tasser and Quark have showed success in their predictions, they are only capable to predict quite similar structures to already known proteins obtained experimentally. The use of such a prior knowledge in the predictions of Rosetta, I-Tasser and Quark may lead to biased predictions. In order to develop a computational algorithm for PSP free of bias, we developed an Estimation of Distribution Algorithm applied to PSP with full-atom and ab initio model. A computational algorithm with ab initio model is mainly interesting when dealing with proteins with low similarity with the known proteins. In this work, we developed an Estimation of Distribution Algorithm with three probabilistic models: univariate, bivariate and hierarchical. The univariate deals with multi-modality of the distribution of the data of a single variable. The bivariate treats the dihedral angles (Φ Ψ) within an amino acid as correlated variables. The hierarchical approach splits the original problem into subproblems and attempts to treat these problems in a separated manner. The experiments show that, indeed, it is possible to achieve better results when modeling the correlation (Φ Ψ). The hierarchical model also showed that is possible to improve the quality of results, mainly for proteins above 50 residues. Besides, we compared our proposed techniques among other metaheuristics from literatures such as: Random Walk, Monte Carlo, Genetic Algorithm and Differential Evolution. The results show that even a less efficient metaheuristic such as Random Walk managed to find the correct structure, however using many prior knowledge (prediction that may be biased). On the other hand, our proposed EDA for PSP was able to find the correct structure with no prior knowledge at all, so we can call this prediction as pure ab initio (biased-free).
|
130 |
Protein secondary structure prediction using amino acid regularitiesSenekal, Frederick Petrus 23 January 2009 (has links)
The protein folding problem is examined. Specifically, the problem of predicting protein secondary structure from the amino acid sequence is investigated. A literature study is presented into the protein folding process and the different techniques that currently exist to predict protein secondary structures. These techniques include the use of expert rules, statistics, information theory and various computational intelligence techniques, such as neural networks, nearest neighbour methods, Hidden Markov Models and Support Vector Machines. A pattern recognition technique based on statistical analysis is developed to predict protein secondary structure from the amino acid sequence. The technique can be applied to any problem where an input pattern is associated with an output pattern and each element in both the input and output patterns can take its value from a set with finite cardinality. The technique is applied to discover the role that small sequences of amino acids play in the formation of protein secondary structures. By applying the technique, a performance score of Q8 = 59:2% is achieved, with a corresponding Q3 score of 69.7%. This compares well with state of the art techniques, such as OSS-HMM and PSIPRED, which achieve Q3 scores of 67.9% and 66.8% respectively, when predictions on single sequences are made. / Dissertation (MEng)--University of Pretoria, 2009. / Electrical, Electronic and Computer Engineering / unrestricted
|
Page generated in 0.0878 seconds