141 |
Redes neurais residuais profundas e autômatos celulares como modelos para predição que fornecem informação sobre a formação de estruturas secundárias proteicas / Residual neural networks and cellular automata as protein secondary structure prediction models with information about foldingPereira, José Geraldo de Carvalho 15 March 2018 (has links)
O processo de auto-organização da estrutura proteica a partir da cadeia de aminoácidos é conhecido como enovelamento. Apesar de conhecermos a estrutura tridimencional de muitas proteínas, para a maioria delas, não possuímos uma compreensão suficiente para descrever em detalhes como a estrutura se organiza a partir da sequência de aminoácidos. É bem conhecido que a formação de núcleos de estruturas locais, conhecida como estrutura secundária, apresenta papel fundamental no enovelamento final da proteína. Desta forma, o desenvolvimento de métodos que permitam não somente predizer a estrutura secundária adotada por um dado resíduo, mas também, a maneira como esse processo deve ocorrer ao longo do tempo é muito relevante em várias áreas da biologia estrutural. Neste trabalho, desenvolvemos dois métodos de predição de estruturas secundárias utilizando modelos com o potencial de fornecer informações mais detalhadas sobre o processo de predição. Um desses modelos foi construído utilizando autômatos celulares, um tipo de modelo dinâmico onde é possível obtermos informações espaciais e temporais. O outro modelo foi desenvolvido utilizando redes neurais residuais profundas. Com este modelo é possível extrair informações espaciais e probabilísticas de suas múltiplas camadas internas de convolução, o que parece refletir, em algum sentido, os estados de formação da estrutura secundária durante o enovelamento. A acurácia da predição obtida por esse modelo foi de ~78% para os resíduos que apresentaram consenso na estrutura atribuída pelos métodos DSSP, STRIDE, KAKSI e PROSS. Tal acurácia, apesar de inferior à obtida pelo PSIPRED, o qual utiliza matrizes PSSM como entrada, é superior à obtida por outros métodos que realizam a predição de estruturas secundárias diretamente a partir da sequência de aminoácidos. / The process of self-organization of the protein structure is known as folding. Although we know the structure of many proteins, for a majority of them, we do not have enough understanding to describe in details how the structure is organized from its amino acid sequence. In this work, we developed two methods for secondary structure prediction using models that have the potential to provide detailed information about the prediction process. One of these models was constructed using cellular automata, a type of dynamic model where it is possible to obtain spatial and temporal information. The other model was developed using deep residual neural networks. With this model it is possible to extract spatial and probabilistic information from its multiple internal layers of convolution. The accuracy of the prediction obtained by this model was ~ 78% for residues that showed consensus in the structure assigned by the DSSP, STRIDE, KAKSI and PROSS methods. Such value is higher than that obtained by other methods which perform the prediction of secondary structures from the amino acid sequence only.
|
142 |
Predição de estrutura terciária de proteínas com técnicas multiobjetivo no algoritmo de monte carlo / Protein tertiary structure prediction with multi-objective techniques in monte carlo algorithmAlmeida, Alexandre Barbosa de 17 June 2016 (has links)
Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2016-08-05T17:38:42Z
No. of bitstreams: 2
Dissertação - Alexandre Barbosa de Almeida - 2016.pdf: 11943401 bytes, checksum: 94f2e941bbde05e098c40f40f0f2f69c (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-08-09T11:57:53Z (GMT) No. of bitstreams: 2
Dissertação - Alexandre Barbosa de Almeida - 2016.pdf: 11943401 bytes, checksum: 94f2e941bbde05e098c40f40f0f2f69c (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2016-08-09T11:57:53Z (GMT). No. of bitstreams: 2
Dissertação - Alexandre Barbosa de Almeida - 2016.pdf: 11943401 bytes, checksum: 94f2e941bbde05e098c40f40f0f2f69c (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2016-06-17 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / Proteins are vital for the biological functions of all living beings on Earth.
However, they only have an active biological function in their native structure, which
is a state of minimum energy. Therefore, protein functionality depends almost exclusively
on the size and shape of its native conformation. However, less than 1% of all known
proteins in the world has its structure solved. In this way, various methods for determining
protein structures have been proposed, either in vitro or in silico experiments. This work
proposes a new in silico method called Monte Carlo with Dominance, which addresses
the problem of protein structure prediction from the point of view of ab initio and
multi-objective optimization, considering both protein energetic and structural aspects.
The software GROMACS was used for the ab initio treatment to perform Molecular
Dynamics simulations, while the framework ProtPred-GROMACS (2PG) was used for
the multi-objective optimization problem, employing genetic algorithms techniques as
heuristic solutions. Monte Carlo with Dominance, in this sense, is like a variant of the
traditional Monte Carlo Metropolis method. The aim is to check if protein tertiary
structure prediction is improved when structural aspects are taken into account. The
energy criterion of Metropolis and energy and structural criteria of Dominance were
compared using RMSD calculation between the predicted and native structures. It was
found that Monte Carlo with Dominance obtained better solutions for two of three proteins
analyzed, reaching a difference about 53% in relation to the prediction by Metropolis. / As proteínas são vitais para as funções biológicas de todos os seres na Terra.
Entretanto, somente apresentam função biológica ativa quando encontram-se em sua
estrutura nativa, que é o seu estado de mínima energia. Portanto, a funcionalidade
de uma proteína depende, quase que exclusivamente, do tamanho e da forma de sua
conformação nativa. Porém, de todas as proteínas conhecidas no mundo, menos de 1%
tem a sua estrutura resolvida. Deste modo, vários métodos de determinação de estruturas
de proteínas têm sido propostos, tanto para experimentos in vitro quanto in silico. Este
trabalho propõe um novo método in silico denominado Monte Carlo com Dominância, o
qual aborda o problema da predição de estrutura de proteínas sob o ponto de vista ab initio
e de otimização multiobjetivo, considerando, simultaneamente, os aspectos energéticos e
estruturais da proteína. Para o tratamento ab initio utiliza-se o software GROMACS
para executar as simulações de Dinâmica Molecular, enquanto que para o problema da
otimização multiobjetivo emprega-se o framework ProtPred-GROMACS (2PG), o qual
utiliza algoritmos genéticos como técnica de soluções heurísticas. O Monte Carlo com
Dominância, nesse sentido, é como uma variante do tradicional método de Monte Carlo
Metropolis. Assim, o objetivo é o de verificar se a predição da estrutura terciária de
proteínas é aprimorada levando-se em conta também os aspectos estruturais. O critério
energético de Metropolis e os critérios energéticos e estruturais da Dominância foram
comparados empregando o cálculo de RMSD entre as estruturas preditas e as nativas.
Foi verificado que o método de Monte Carlo com Dominância obteve melhores soluções
para duas de três proteínas analisadas, chegando a cerca de 53% de diferença da predição
por Metropolis.
|
143 |
Predicting Linguistic Structure with Incomplete and Cross-Lingual SupervisionTäckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
|
144 |
Redes neurais residuais profundas e autômatos celulares como modelos para predição que fornecem informação sobre a formação de estruturas secundárias proteicas / Residual neural networks and cellular automata as protein secondary structure prediction models with information about foldingJosé Geraldo de Carvalho Pereira 15 March 2018 (has links)
O processo de auto-organização da estrutura proteica a partir da cadeia de aminoácidos é conhecido como enovelamento. Apesar de conhecermos a estrutura tridimencional de muitas proteínas, para a maioria delas, não possuímos uma compreensão suficiente para descrever em detalhes como a estrutura se organiza a partir da sequência de aminoácidos. É bem conhecido que a formação de núcleos de estruturas locais, conhecida como estrutura secundária, apresenta papel fundamental no enovelamento final da proteína. Desta forma, o desenvolvimento de métodos que permitam não somente predizer a estrutura secundária adotada por um dado resíduo, mas também, a maneira como esse processo deve ocorrer ao longo do tempo é muito relevante em várias áreas da biologia estrutural. Neste trabalho, desenvolvemos dois métodos de predição de estruturas secundárias utilizando modelos com o potencial de fornecer informações mais detalhadas sobre o processo de predição. Um desses modelos foi construído utilizando autômatos celulares, um tipo de modelo dinâmico onde é possível obtermos informações espaciais e temporais. O outro modelo foi desenvolvido utilizando redes neurais residuais profundas. Com este modelo é possível extrair informações espaciais e probabilísticas de suas múltiplas camadas internas de convolução, o que parece refletir, em algum sentido, os estados de formação da estrutura secundária durante o enovelamento. A acurácia da predição obtida por esse modelo foi de ~78% para os resíduos que apresentaram consenso na estrutura atribuída pelos métodos DSSP, STRIDE, KAKSI e PROSS. Tal acurácia, apesar de inferior à obtida pelo PSIPRED, o qual utiliza matrizes PSSM como entrada, é superior à obtida por outros métodos que realizam a predição de estruturas secundárias diretamente a partir da sequência de aminoácidos. / The process of self-organization of the protein structure is known as folding. Although we know the structure of many proteins, for a majority of them, we do not have enough understanding to describe in details how the structure is organized from its amino acid sequence. In this work, we developed two methods for secondary structure prediction using models that have the potential to provide detailed information about the prediction process. One of these models was constructed using cellular automata, a type of dynamic model where it is possible to obtain spatial and temporal information. The other model was developed using deep residual neural networks. With this model it is possible to extract spatial and probabilistic information from its multiple internal layers of convolution. The accuracy of the prediction obtained by this model was ~ 78% for residues that showed consensus in the structure assigned by the DSSP, STRIDE, KAKSI and PROSS methods. Such value is higher than that obtained by other methods which perform the prediction of secondary structures from the amino acid sequence only.
|
145 |
Detekce a segmentace mozkového nádoru v multisekvenčním MRI / Brain Tumor Detection and Segmentation in Multisequence MRIDvořák, Pavel January 2015 (has links)
Tato práce se zabývá detekcí a segmentací mozkového nádoru v multisekvenčních MR obrazech se zaměřením na gliomy vysokého a nízkého stupně malignity. Jsou zde pro tento účel navrženy tři metody. První metoda se zabývá detekcí prezence částí mozkového nádoru v axiálních a koronárních řezech. Jedná se o algoritmus založený na analýze symetrie při různých rozlišeních obrazu, který byl otestován na T1, T2, T1C a FLAIR obrazech. Druhá metoda se zabývá extrakcí oblasti celého mozkového nádoru, zahrnující oblast jádra tumoru a edému, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkový nádor z 2D i 3D obrazů. Je zde opět využita analýza symetrie, která je následována automatickým stanovením intenzitního prahu z nejvíce asymetrických částí. Třetí metoda je založena na predikci lokální struktury a je schopna segmentovat celou oblast nádoru, jeho jádro i jeho aktivní část. Metoda využívá faktu, že většina lékařských obrazů vykazuje vysokou podobnost intenzit sousedních pixelů a silnou korelaci mezi intenzitami v různých obrazových modalitách. Jedním ze způsobů, jak s touto korelací pracovat a používat ji, je využití lokálních obrazových polí. Podobná korelace existuje také mezi sousedními pixely v anotaci obrazu. Tento příznak byl využit v predikci lokální struktury při lokální anotaci polí. Jako klasifikační algoritmus je v této metodě použita konvoluční neuronová síť vzhledem k její známe schopnosti zacházet s korelací mezi příznaky. Všechny tři metody byly otestovány na veřejné databázi 254 multisekvenčních MR obrazech a byla dosáhnuta přesnost srovnatelná s nejmodernějšími metodami v mnohem kratším výpočetním čase (v řádu sekund při použitý CPU), což poskytuje možnost manuálních úprav při interaktivní segmetaci.
|
146 |
Sparse RNA folding revisited: space‑efficient minimum free energy structure predictionWill, Sebastian, Jabbari, Hosna January 2016 (has links)
Background: RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, spaceefficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Results: Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by n2, but are typically much smaller. The time complexity of RNA folding is reduced from O(n3) to O(n2 + nZ); the space complexity, from O(n2) to O(n + T + Z). Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). Conclusions: The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA–RNA-interaction prediction are expected to profit even stronger than \"standard\" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.unileipzig. de/~will/Software/SparseMFEFold.
|
147 |
Protein Structural Modeling Using Electron Microscopy MapsEman Alnabati (13108032) 19 July 2022 (has links)
<p>Proteins are significant components of living cells. They perform a diverse range of biological functions such as cell shape and metabolism. The functions of proteins are determined by their three-dimensional structures. Cryogenic-electron microscopy (cryo-EM) is a technology known for determining the structure of large macromolecular structures including protein complexes. When individual atomic protein structures are available, a critical task in structure modeling is fitting the individual structures into the cryo-EM density map.</p>
<p>In my research, I report a new computational method, MarkovFit, which is a machine learning-based method that performs simultaneous rigid fitting of the atomic structures of individual proteins into cryo-EM maps of medium to low resolution to model the three-dimensional structure of protein complexes. MarkovFit uses Markov random field (MRF), which allows probabilistic evaluation of fitted models. MarkovFit starts by searching the conformational space using FFT for potential poses of protein structures, computes scores which quantify the goodness-of-fit between each individual protein and the cryo-EM map, and the interactions between the proteins. Afterwards, proteins and their interactions are represented using a MRF graph. MRF nodes use a belief propagation algorithm to exchange information, and the best conformations are then extracted and refined using two structural refinement methods. </p>
<p>The performance of MarkovFit was tested on three datasets; a dataset of simulated cryo-EM maps at resolution 10 Å, a dataset of high-resolution experimentally-determined cryo-EM maps, and a dataset of experimentally-determined cryo-EM maps of medium to low resolution. In addition to that, the performance of MarkovFit was compared to two state-of-the-art methods on their datasets. Lastly, MarkovFit modeled the protein complexes from the individual protein atomic models generated by AlphaFold, an AI-based model developed by DeepMind for predicting the 3D structure of proteins from their amino acid sequences.</p>
|
148 |
La reconnaissance automatique des brins complémentaires : leçons concernant les habiletés des algorithmes d'apprentissage automatique en repliement des acides ribonucléiquesChasles, Simon 07 1900 (has links)
L'acide ribonucléique (ARN) est une molécule impliquée dans de nombreuses fonctions cellulaires comme la traduction génétique et la régulation de l’expression des gènes. Les récents succès des vaccins à ARN témoignent du rôle que ce dernier peut jouer dans le développement de traitements thérapeutiques. La connaissance de la fonction d’un ARN passe par sa séquence et sa structure lesquelles déterminent quels groupes chimiques (et de quelles manières ces groupes chimiques) peuvent interagir avec d’autres molécules. Or, les structures connues sont rares en raison du coût et de l’inefficacité des méthodes expérimentales comme la résonnance magnétique nucléaire et la cristallographie aux rayons X. Par conséquent, les méthodes calculatoires ne cessent d’être raffinées afin de déterminer adéquatement la structure d’un ARN à partir de sa séquence. Compte tenu de la croissance des jeux de données et des progrès incessants de l’apprentissage profond, de nombreuses architectures de réseaux neuronaux ont été proposées afin de résoudre le problème du repliement de l’ARN. Toutefois, les jeux de données actuels et la nature des mécanismes de repliement de l’ARN dressent des obstacles importants à l’application de l’apprentissage statistique en prédiction de structures d’ARN. Ce mémoire de maîtrise se veut une couverture des principaux défis inhérents à la résolution du problème du repliement de l’ARN par apprentissage automatique. On y formule une tâche fondamentale afin d’étudier le comportement d’une multitude d’algorithmes lorsque confrontés à divers contextes statistiques, le tout dans le but d’éviter le surapprentissage, problème dont souffre une trop grande proportion des méthodes publiées jusqu’à présent. / Ribonucleic acid (RNA) is a molecule involved in many cellular functions like translation and regulation of gene expression. The recent success of RNA vaccines demonstrates the role RNA can play in the development of therapeutic treatments. The function of an RNA depends on its sequence and structure, which determine which chemical groups (and in what ways these chemical groups) can interact with other molecules. However, only a few RNA structures are known due to the high cost and low throughput of experimental methods such as nuclear magnetic resonance and X-ray crystallography. As a result, computational methods are constantly being refined to accurately determine the structure of an RNA from its sequence. Given the growth of datasets and the constant progress of deep learning, many neural network architectures have been proposed to solve the RNA folding problem. However, the nature of current datasets and RNA folding mechanisms hurdles the application of statistical learning to RNA structure prediction. Here, we cover the main challenges one can encounter when solving the RNA folding problem by machine learning. With an emphasis on overfitting, a problem that affects too many of the methods published so far, we formulate a fundamental RNA problem to study the behaviour of a variety of algorithms when confronted with various statistical contexts.
|
149 |
Structure-Based Computer Aided Drug Design and Analysis for Different Disease TargetsKumari, Vandana 13 September 2011 (has links)
No description available.
|
150 |
Computational analysis of wide-angle light scattering from single cellsPilarski, Patrick Michael 11 1900 (has links)
The analysis of wide-angle cellular light scattering patterns is a challenging problem. Small changes to the organization, orientation, shape, and optical properties of scatterers and scattering populations can significantly alter their complex two-dimensional scattering signatures. Because of this, it is difficult to find methods that can identify medically relevant cellular properties while remaining robust to experimental noise and sample-to-sample differences. It is an important problem. Recent work has shown that changes to the internal structure of cells---specifically, the distribution and aggregation of organelles---can indicate the progression of a number of common disorders, ranging from cancer to neurodegenerative disease, and can also predict a patient's response to treatments like chemotherapy. However, there is no direct analytical solution to the inverse wide-angle cellular light scattering problem, and available simulation and interpretation methods either rely on restrictive cell models, or are too computationally demanding for routine use.
This dissertation addresses these challenges from a computational vantage point. First, it explores the theoretical limits and optical basis for wide-angle scattering pattern analysis. The result is a rapid new simulation method to generate realistic organelle scattering patterns without the need for computationally challenging or restrictive routines. Pattern analysis, image segmentation, machine learning, and iterative pattern classification methods are then used to identify novel relationships between wide-angle scattering patterns and the distribution of organelles (in this case mitochondria) within a cell. Importantly, this work shows that by parameterizing a scattering image it is possible to extract vital information about cell structure while remaining robust to changes in organelle concentration, effective size, and random placement. The result is a powerful collection of methods to simulate and interpret experimental light scattering signatures. This gives new insight into the theoretical basis for wide-angle cellular light scattering, and facilitates advances in real-time patient care, cell structure prediction, and cell morphology research.
|
Page generated in 0.1045 seconds