Global ETD Search

1	Computational statistical mechanics of protein function Mugnai, Mauro Lorenzo 24 October 2014 (has links) Molecular dynamics (MD) provides an atomically detailed description of the dynamics of a system of atoms. It is a useful tool to understand how protein function arises from the dynamics of the atoms of the protein and of its environment. When the MD model is accurate, analyzing a MD trajectory unveils features of the proteins that are not available from a single snapshot or a static structure. When the sampling of the accessible configurations is accurate, we can employ statistical mechanics (SM) to connect the trajectory generated by MD to experimentally measurable kinetic and thermodynamic quantities that are related to function. In this dissertation I describe three applications of MD and SM in the field of biochemistry. First, I discuss the theory of alchemical methods to compute free energy differences. In these methods a fragment of a system is computationally modified by removing its interactions with the environment and creating the interactions of the environment with the new species. This theory provides a numerical scheme to efficiently compute protein-ligand affinity, solvation free energies, and the effect of mutations on protein structure. I investigated the theory and stability of the numerical algorithm. The second research topic that I discuss considers a model of the dynamics of a set of coarse variables. The dynamics in coarse space is modeled by the Smoluchowski equation. To employ this description it is necessary to have the correct potential of mean force and diffusion tensor in the space of coarse variables. I describe a new method that I developed to extract the diffusion tensor from a MD simulation. Finally, I employed MD simulations to explain at a microscopic level the stereospecificity of the enzyme ketoreductase. To do so, I ran multiple simulations of the enzyme bound with the correct ligand and its enantiomer in a reactive configuration. The simulations showed that the enzyme retained the correct stereoisomer closer to the reactive configuration, and highlighted which interactions are responsible for the specificity. These weak physical interactions enhance binding with the correct ligand even prior to the steps of chemical modification. / text Molecular dynamics Protein function
2	PREDICTION OF PROTEIN FUNCTION USING TEXT FEATURES EXTRACTED FROM THE BIOMEDICAL LITERATURE Wong, ANDREW 25 April 2013 (has links) Proteins perform many important functions in the cell and are essential to the health of the cell and the organism. As such, there is much effort to understand the function of proteins. Due to the advances in sequencing technology, there are many sequences of proteins whose function is yet unknown. Therefore, computational systems are being developed and used to help predict protein function. Most computational systems represent proteins using features that are derived from protein sequence or protein structure to predict function. In contrast, there are very few systems that use the biomedical literature as a source of features. Earlier work demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. In this thesis we build on that earlier work, and examine the effectiveness of using text features to predict protein function. Using the molecular function and biological process terms from the Gene Ontology (GO) as our function classes, we trained two classifiers (k-Nearest Neighbour and Support Vector Machines) to predict protein function. The proteins were represented using text features that were extracted from biomedical abstracts based on statistical properties. For evaluation, the performance of our two classifiers was compared to that of two baseline classifiers: one that assigns function based solely on the prior distribution of protein function, and one that assigns function based on sequence similarity. The systems were trained and tested using 5-fold cross-validation over a dataset of more than 36,000 proteins. Overall, we show that text features extracted from biomedical literature can be used to predict protein function for any organism. Our results also show that our text-based classifier typically has comparable performance to the sequence-similarity baseline classifier. Based on our results and what previous work had shown, we believe that text features can be integrated with other types of features to provide more accurate predictions for protein function. / Thesis (Master, Computing) -- Queen's University, 2013-04-24 21:07:13.983 computer science protein function prediction
3	Dehydron as a Marker For Drug Design Jain, Manojkumar D. 26 July 2006 (has links) Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics, Indiana University December 2005 / The approach of exploiting highly conserved protein folds and structure in understanding protein function and in designing drugs leads to drugs that are less selective due to association with similar proteins. Over the years an open problem for researchers has been to develop drug design models based on non-conserved features to have higher selectivity. Recently a new structural feature, the dehydron, has been demonstrated to vary across proteins with conserved folds. Dehydrons are backbone hydrogen bonds that are not adequately protected from water. The importance of wrapping dehydrons in ligand binding and non-conservation of dehydrons across similar proteins makes them important candidates for markers in drug design. Investigation on a series of proteins – PDB entries: 1IA8, 1NVQ, 1NVS, 1NVR, 1OKZ, and 1PKD – revealed the potential impact of wrapping on binding affinity of the ligands. Unlike in 1NVS, 1NVR, 1OKZ, and 1PKD, inhibitor UCN in 1NVQ wrapped both the dehydrons in active site region of the checkpoint protein kinase, thereby indicating an increased potency and higher selectivity. On detailed analysis of 193 protein kinases, roughly 70% were found to have two or more dehydrons in the neighborhood of the bound ligand. Also, about 70% of proteins had dehydrons within the active site region. Only around 20% of ligands, however, actually wrapped two or more dehydrons. These statistics clearly illustrate the significance of dehydrons and their potential use as markers for drug design to enhance drug efficacy as well as selectivity, and to reduce side effects in the process. protein function protein dehydron drug design selectivity
4	Geometry-based methods for protein function prediction Chen, Brian Yuan January 2006 (has links) The development of new and effective drugs is strongly affected by the need to identify drug targets and to reduce side effects. Unfortunately, resolving these issues depends partially on a broad and thorough understanding of the biological function of many proteins, and the experimental determination of protein function is expensive and time consuming. In response to this problem, algorithms for computational function prediction have been designed to expand experimental impact by finding proteins with predictably similar function, mapping experimental knowledge onto very similar, unstudied proteins. This thesis seeks to develop one method that can identify useful geometric and chemical similarities between well studied and unstudied proteins. Our approach is to identify matches of geometric and chemical similarity between motifs , representing known functional sites, and substructures of functionally uncharacterized proteins ( targets ). It is commonly hypothesized that the existence of a match could imply that the target contains an active site similar to the motif. We have designed the MASH ( M atch A ugmentation with S tatistical H ypothesis Testing) pipeline, a software tool for computing matches. MASH is the first method to match point-based motifs, developed in earlier work, that represent functional sites as points in space with ranked priorities and alternative chemical labels. MASH is also first to match cavity-aware motifs, a novel contribution of this work, that extend point-based motifs with volumetric information describing active clefts critical to protein function. Controlled experiments demonstrate that matches for both types of motifs can identify cognate active sites. However, motifs can also identify matches to functionally unrelated proteins. For this reason, we developed M otif Profiling (MP), the first method for motif refinement that reduces geometric similarity to functionally unre lated proteins. MP is implemented in two forms: Geometric Sieving (GS) refines point-based motifs and Cavity Scaling (CS) refines cavity-aware motifs. Controlled experimentation demonstrates that GS and CS identify motif refinements that have more matches to functionally related proteins and less matches to functionally unrelated proteins. This thesis demonstrates the importance of computational tools for matching and refining motifs, emphasizing the applicability of large-scale geometric and statistical analysis for functional annotation. / National Science Foundation, National Library of Medicine, AMD, Cray Protein function Motif profiling Bioinformatics Geometric Sieving Cavity Scaling
5	In-silico characterization and prediction of protein-small ligand interactions Chen, Ke Unknown Date No description available. protein ligand interaction prediction binding site protein function annotation
6	Automatic Protein Function Annotation Through Text Mining Toonsi, Sumyyah 25 August 2019 (has links) The knowledge of a protein’s function is essential to many studies in molecular biology, genetic experiments and protein-protein interactions. The Gene Ontology (GO) captures gene products' functions in classes and establishes relationship between them. Manually annotating proteins with GO functions from the bio-medical litera- ture is a tedious process which calls for automation. We develop a novel, dictionary- based method to annotate proteins with functions from text. We extract text-based features from words matched against a dictionary of GO. Since classes are included upon any word match with their class description, the number of negative samples outnumbers the positive ones. To mitigate this imbalance, we apply strict rules before weakly labeling the dataset according to the curated annotations. Furthermore, we discard samples of low statistical evidence and train a logistic regression classifier. The results of a 5-fold cross-validation show a high precision of 91% and 96% accu- racy in the best performing fold. The worst fold showed a precision of 80% and an accuracy of 95%. We conclude by explaining how this method can be used for similar annotation problems. Protein function Gene Ontology Text Mining Biomedical Annotation Automatic
7	Learning 3D structures for protein function prediction Muttakin, Md Nurul 05 1900 (has links) Machine learning models such as AlphaFold can generate protein 3D conformation from primary sequence up to experimental accuracy, which gives rise to a bunch of research works to predict protein functions from 3D structures. Almost all of these works attempted to use graph neural networks (GNN) to learn 3D structures of proteins from 2D contact maps/graphs. Most of these works use rich 1D features such as ESM and LSTM embedding in addition to the contact graph. These rich 1D features essentially obfuscate the learning capability of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from contact map graphs in the existing framework, where we attempt to incorporate distance information for better predictive performance. We found that GCNs fall far short with 1D-CNN without language models, even with distance information. Consequently, we further investigate the capabilities of GCNs to distinguish subgraph patterns corresponding to the InterPro domains. We found that GCNs perform better than highly rich sequence embedding with MLP in recognizing the structural patterns. Finally, we investigate the capability of GCNs to predict GO-terms (functions) individually. We found that GCNs perform almost on par in identifying GO-terms in the presence of only hard positive and hard negative examples. We also identified some GO-terms indistinguishable by GCNs and ESM2-based MLP models. This gives rise to new research questions to be investigated by future works. 3D structure learning Protein function prediction Graph neural networks
8	Predicting Protein Functions From Interactions Using Neural Networks and Ontologies Qathan, Shahad 22 November 2022 (has links) To understand the process of life, it is crucial for us to study proteins and their functions. Proteins execute (almost) all cellular activities, and their functions are standardized by Gene Ontology (GO). The amount of discovered protein sequences grows rapidly as a consequence of the fast rate of development of technologies in gene sequencing. In UniProtKB, there are more than 200 million proteins. Still, less than 1% of the proteins in the UniProtKB database are experimentally GO-annotated, which is the result of the exorbitant cost of biological experiments. To minimize the large gap, developing an efficient and effective method for automatic protein function prediction (AFP) is essential. Many approaches have been proposed to solve the AFP problem. Still, these methods suffer from limitations in the way the knowledge of the domain is presented and what type of knowledge is included. In this work, we formulate the task of AFP as an entailment problem and exploit the structure of the related knowledge in a set and reusable framework. To achieve this goal, we construct a knowledge base of formal GO axioms and protein-protein interactions to use as background knowledge for AFP. Our experiments show that the approach proposed here, which allows for ontology awareness, improves results for AFP of proteins; they also show the importance of including protein-protein interactions for predicting the functions of proteins. Machine Learning Ontologies Graph Neural Networks Protein Function Prediction
9	Pattern Oriented Methods for Inferring Protein Annotations within Protein Interaction Networks Kirac, Mustafa January 2009 (has links) No description available. Bioinformatics Computer Science Bioinformatics protein interaction networks protein function prediction
10	Statistical Phylogenetic Models for the Inference of Functionally Important Regions in Proteins Huang, Yifei 04 1900 (has links) <p>An important question in biology is the identification of functionally important sites and regions in proteins. A variety of statistical phylogenetic models have been developed to predict functionally important protein sites, e.g. ligand binding sites or protein-protein interaction interfaces, by comparing sequences from different species. However, most of the existing methods ignore the spatial clustering of functionally important sites in protein tertiary/primary structures, which significantly reduces their power to identify functionally important regions in proteins. In this thesis, we present several new statistical phylogenetic models for inferring functionally important protein regions in which Gaussian processes or hidden Markov models are used as prior distributions to model the spatial correlation of evolutionary patterns in protein tertiary/ primary structures. Both simulation studies and empirical data analyses suggest that these new models outperform classic phylogenetic models. Therefore, these new models may be useful tools for extracting functional insights from protein sequences and for guiding mutagenesis experiments. Furthermore, the new methodologies developed in these models may also be used in the development of new statistical models to answer other important questions in phylogenetics and molecular evolution.</p> / Doctor of Philosophy (PhD) Phylogenetics Bayesian Model Protein Function Statistics Bioinformatics Bioinformatics

Search results