Spelling suggestions: "subject:"3structure aprediction"" "subject:"3structure iprediction""
141 |
Predicting Linguistic Structure with Incomplete and Cross-Lingual SupervisionTäckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
|
142 |
Redes neurais residuais profundas e autômatos celulares como modelos para predição que fornecem informação sobre a formação de estruturas secundárias proteicas / Residual neural networks and cellular automata as protein secondary structure prediction models with information about foldingJosé Geraldo de Carvalho Pereira 15 March 2018 (has links)
O processo de auto-organização da estrutura proteica a partir da cadeia de aminoácidos é conhecido como enovelamento. Apesar de conhecermos a estrutura tridimencional de muitas proteínas, para a maioria delas, não possuímos uma compreensão suficiente para descrever em detalhes como a estrutura se organiza a partir da sequência de aminoácidos. É bem conhecido que a formação de núcleos de estruturas locais, conhecida como estrutura secundária, apresenta papel fundamental no enovelamento final da proteína. Desta forma, o desenvolvimento de métodos que permitam não somente predizer a estrutura secundária adotada por um dado resíduo, mas também, a maneira como esse processo deve ocorrer ao longo do tempo é muito relevante em várias áreas da biologia estrutural. Neste trabalho, desenvolvemos dois métodos de predição de estruturas secundárias utilizando modelos com o potencial de fornecer informações mais detalhadas sobre o processo de predição. Um desses modelos foi construído utilizando autômatos celulares, um tipo de modelo dinâmico onde é possível obtermos informações espaciais e temporais. O outro modelo foi desenvolvido utilizando redes neurais residuais profundas. Com este modelo é possível extrair informações espaciais e probabilísticas de suas múltiplas camadas internas de convolução, o que parece refletir, em algum sentido, os estados de formação da estrutura secundária durante o enovelamento. A acurácia da predição obtida por esse modelo foi de ~78% para os resíduos que apresentaram consenso na estrutura atribuída pelos métodos DSSP, STRIDE, KAKSI e PROSS. Tal acurácia, apesar de inferior à obtida pelo PSIPRED, o qual utiliza matrizes PSSM como entrada, é superior à obtida por outros métodos que realizam a predição de estruturas secundárias diretamente a partir da sequência de aminoácidos. / The process of self-organization of the protein structure is known as folding. Although we know the structure of many proteins, for a majority of them, we do not have enough understanding to describe in details how the structure is organized from its amino acid sequence. In this work, we developed two methods for secondary structure prediction using models that have the potential to provide detailed information about the prediction process. One of these models was constructed using cellular automata, a type of dynamic model where it is possible to obtain spatial and temporal information. The other model was developed using deep residual neural networks. With this model it is possible to extract spatial and probabilistic information from its multiple internal layers of convolution. The accuracy of the prediction obtained by this model was ~ 78% for residues that showed consensus in the structure assigned by the DSSP, STRIDE, KAKSI and PROSS methods. Such value is higher than that obtained by other methods which perform the prediction of secondary structures from the amino acid sequence only.
|
143 |
Detekce a segmentace mozkového nádoru v multisekvenčním MRI / Brain Tumor Detection and Segmentation in Multisequence MRIDvořák, Pavel January 2015 (has links)
Tato práce se zabývá detekcí a segmentací mozkového nádoru v multisekvenčních MR obrazech se zaměřením na gliomy vysokého a nízkého stupně malignity. Jsou zde pro tento účel navrženy tři metody. První metoda se zabývá detekcí prezence částí mozkového nádoru v axiálních a koronárních řezech. Jedná se o algoritmus založený na analýze symetrie při různých rozlišeních obrazu, který byl otestován na T1, T2, T1C a FLAIR obrazech. Druhá metoda se zabývá extrakcí oblasti celého mozkového nádoru, zahrnující oblast jádra tumoru a edému, ve FLAIR a T2 obrazech. Metoda je schopna extrahovat mozkový nádor z 2D i 3D obrazů. Je zde opět využita analýza symetrie, která je následována automatickým stanovením intenzitního prahu z nejvíce asymetrických částí. Třetí metoda je založena na predikci lokální struktury a je schopna segmentovat celou oblast nádoru, jeho jádro i jeho aktivní část. Metoda využívá faktu, že většina lékařských obrazů vykazuje vysokou podobnost intenzit sousedních pixelů a silnou korelaci mezi intenzitami v různých obrazových modalitách. Jedním ze způsobů, jak s touto korelací pracovat a používat ji, je využití lokálních obrazových polí. Podobná korelace existuje také mezi sousedními pixely v anotaci obrazu. Tento příznak byl využit v predikci lokální struktury při lokální anotaci polí. Jako klasifikační algoritmus je v této metodě použita konvoluční neuronová síť vzhledem k její známe schopnosti zacházet s korelací mezi příznaky. Všechny tři metody byly otestovány na veřejné databázi 254 multisekvenčních MR obrazech a byla dosáhnuta přesnost srovnatelná s nejmodernějšími metodami v mnohem kratším výpočetním čase (v řádu sekund při použitý CPU), což poskytuje možnost manuálních úprav při interaktivní segmetaci.
|
144 |
Sparse RNA folding revisited: space‑efficient minimum free energy structure predictionWill, Sebastian, Jabbari, Hosna January 2016 (has links)
Background: RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, spaceefficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Results: Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by n2, but are typically much smaller. The time complexity of RNA folding is reduced from O(n3) to O(n2 + nZ); the space complexity, from O(n2) to O(n + T + Z). Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). Conclusions: The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA–RNA-interaction prediction are expected to profit even stronger than \"standard\" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.unileipzig. de/~will/Software/SparseMFEFold.
|
145 |
Protein Structural Modeling Using Electron Microscopy MapsEman Alnabati (13108032) 19 July 2022 (has links)
<p>Proteins are significant components of living cells. They perform a diverse range of biological functions such as cell shape and metabolism. The functions of proteins are determined by their three-dimensional structures. Cryogenic-electron microscopy (cryo-EM) is a technology known for determining the structure of large macromolecular structures including protein complexes. When individual atomic protein structures are available, a critical task in structure modeling is fitting the individual structures into the cryo-EM density map.</p>
<p>In my research, I report a new computational method, MarkovFit, which is a machine learning-based method that performs simultaneous rigid fitting of the atomic structures of individual proteins into cryo-EM maps of medium to low resolution to model the three-dimensional structure of protein complexes. MarkovFit uses Markov random field (MRF), which allows probabilistic evaluation of fitted models. MarkovFit starts by searching the conformational space using FFT for potential poses of protein structures, computes scores which quantify the goodness-of-fit between each individual protein and the cryo-EM map, and the interactions between the proteins. Afterwards, proteins and their interactions are represented using a MRF graph. MRF nodes use a belief propagation algorithm to exchange information, and the best conformations are then extracted and refined using two structural refinement methods. </p>
<p>The performance of MarkovFit was tested on three datasets; a dataset of simulated cryo-EM maps at resolution 10 Å, a dataset of high-resolution experimentally-determined cryo-EM maps, and a dataset of experimentally-determined cryo-EM maps of medium to low resolution. In addition to that, the performance of MarkovFit was compared to two state-of-the-art methods on their datasets. Lastly, MarkovFit modeled the protein complexes from the individual protein atomic models generated by AlphaFold, an AI-based model developed by DeepMind for predicting the 3D structure of proteins from their amino acid sequences.</p>
|
146 |
La reconnaissance automatique des brins complémentaires : leçons concernant les habiletés des algorithmes d'apprentissage automatique en repliement des acides ribonucléiquesChasles, Simon 07 1900 (has links)
L'acide ribonucléique (ARN) est une molécule impliquée dans de nombreuses fonctions cellulaires comme la traduction génétique et la régulation de l’expression des gènes. Les récents succès des vaccins à ARN témoignent du rôle que ce dernier peut jouer dans le développement de traitements thérapeutiques. La connaissance de la fonction d’un ARN passe par sa séquence et sa structure lesquelles déterminent quels groupes chimiques (et de quelles manières ces groupes chimiques) peuvent interagir avec d’autres molécules. Or, les structures connues sont rares en raison du coût et de l’inefficacité des méthodes expérimentales comme la résonnance magnétique nucléaire et la cristallographie aux rayons X. Par conséquent, les méthodes calculatoires ne cessent d’être raffinées afin de déterminer adéquatement la structure d’un ARN à partir de sa séquence. Compte tenu de la croissance des jeux de données et des progrès incessants de l’apprentissage profond, de nombreuses architectures de réseaux neuronaux ont été proposées afin de résoudre le problème du repliement de l’ARN. Toutefois, les jeux de données actuels et la nature des mécanismes de repliement de l’ARN dressent des obstacles importants à l’application de l’apprentissage statistique en prédiction de structures d’ARN. Ce mémoire de maîtrise se veut une couverture des principaux défis inhérents à la résolution du problème du repliement de l’ARN par apprentissage automatique. On y formule une tâche fondamentale afin d’étudier le comportement d’une multitude d’algorithmes lorsque confrontés à divers contextes statistiques, le tout dans le but d’éviter le surapprentissage, problème dont souffre une trop grande proportion des méthodes publiées jusqu’à présent. / Ribonucleic acid (RNA) is a molecule involved in many cellular functions like translation and regulation of gene expression. The recent success of RNA vaccines demonstrates the role RNA can play in the development of therapeutic treatments. The function of an RNA depends on its sequence and structure, which determine which chemical groups (and in what ways these chemical groups) can interact with other molecules. However, only a few RNA structures are known due to the high cost and low throughput of experimental methods such as nuclear magnetic resonance and X-ray crystallography. As a result, computational methods are constantly being refined to accurately determine the structure of an RNA from its sequence. Given the growth of datasets and the constant progress of deep learning, many neural network architectures have been proposed to solve the RNA folding problem. However, the nature of current datasets and RNA folding mechanisms hurdles the application of statistical learning to RNA structure prediction. Here, we cover the main challenges one can encounter when solving the RNA folding problem by machine learning. With an emphasis on overfitting, a problem that affects too many of the methods published so far, we formulate a fundamental RNA problem to study the behaviour of a variety of algorithms when confronted with various statistical contexts.
|
147 |
Structure-Based Computer Aided Drug Design and Analysis for Different Disease TargetsKumari, Vandana 13 September 2011 (has links)
No description available.
|
148 |
Computational analysis of wide-angle light scattering from single cellsPilarski, Patrick Michael 11 1900 (has links)
The analysis of wide-angle cellular light scattering patterns is a challenging problem. Small changes to the organization, orientation, shape, and optical properties of scatterers and scattering populations can significantly alter their complex two-dimensional scattering signatures. Because of this, it is difficult to find methods that can identify medically relevant cellular properties while remaining robust to experimental noise and sample-to-sample differences. It is an important problem. Recent work has shown that changes to the internal structure of cells---specifically, the distribution and aggregation of organelles---can indicate the progression of a number of common disorders, ranging from cancer to neurodegenerative disease, and can also predict a patient's response to treatments like chemotherapy. However, there is no direct analytical solution to the inverse wide-angle cellular light scattering problem, and available simulation and interpretation methods either rely on restrictive cell models, or are too computationally demanding for routine use.
This dissertation addresses these challenges from a computational vantage point. First, it explores the theoretical limits and optical basis for wide-angle scattering pattern analysis. The result is a rapid new simulation method to generate realistic organelle scattering patterns without the need for computationally challenging or restrictive routines. Pattern analysis, image segmentation, machine learning, and iterative pattern classification methods are then used to identify novel relationships between wide-angle scattering patterns and the distribution of organelles (in this case mitochondria) within a cell. Importantly, this work shows that by parameterizing a scattering image it is possible to extract vital information about cell structure while remaining robust to changes in organelle concentration, effective size, and random placement. The result is a powerful collection of methods to simulate and interpret experimental light scattering signatures. This gives new insight into the theoretical basis for wide-angle cellular light scattering, and facilitates advances in real-time patient care, cell structure prediction, and cell morphology research.
|
149 |
Computational analysis of wide-angle light scattering from single cellsPilarski, Patrick Michael Unknown Date
No description available.
|
150 |
Computational studies of protein helix kinksWilman, Henry R. January 2014 (has links)
Kinks are functionally important structural features found in the alpha-helices of many proteins, particularly membrane proteins. Structurally, they are points at which a helix abruptly changes direction. Previous kink definition and identification methods often disagree with one another. Here I describe three novel methods to characterise kinks, which improve on existing approaches. First, Kink Finder, a computational method that consistently locates kinks and estimates the error in the kink angle. Second the B statistic, a statistically robust method for identifying kinks. Third, Alpha Helices Assessed by Humans, a crowdsourcing approach that provided a gold-standard data set on which to train and compare existing kink identification methods. In this thesis, I show that kinks are a feature of long -helices in both soluble and membrane proteins, rather than just transmembrane -helices. Characteristics of kinks in the two types of proteins are similar, with Proline being the dominant feature in both types of protein. In soluble proteins, kinked helices also have a clear structural preference in that they typically point into the solvent. I also explored the conservation of kinks in homologous proteins. I found examples of conserved and non-conserved kinks in both the helix pairs and the helix families. Helix pairs with non-conserved kinks generally have less similar sequences than helix pairs with conserved kinks. I identified helix families that show highly conserved kinks, and families that contain non-conserved kinks, suggesting that some kinks may be flexible points in protein structures.
|
Page generated in 0.1073 seconds