Spelling suggestions: "subject:"1protein function prediction"" "subject:"2protein function prediction""
1 |
PREDICTION OF PROTEIN FUNCTION USING TEXT FEATURES EXTRACTED FROM THE BIOMEDICAL LITERATUREWong, ANDREW 25 April 2013 (has links)
Proteins perform many important functions in the cell and are essential to the health of the cell and the organism. As such, there is much effort to understand the function of proteins. Due to the advances in sequencing technology, there are many sequences of proteins whose function is yet unknown. Therefore, computational systems are being developed and used to help predict protein function.
Most computational systems represent proteins using features that are derived from protein sequence or protein structure to predict function. In contrast, there are very few systems that use the biomedical literature as a source of features. Earlier work demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. In this thesis we build on that earlier work, and examine the effectiveness of using text features to predict protein function.
Using the molecular function and biological process terms from the Gene Ontology (GO) as our function classes, we trained two classifiers (k-Nearest Neighbour and Support Vector Machines) to predict protein function. The proteins were represented using text features that were extracted from biomedical abstracts based on statistical properties. For evaluation, the performance of our two classifiers was compared to that of two baseline classifiers: one that assigns function based solely on the prior distribution of protein function, and one that assigns function based on sequence similarity. The systems were trained and tested using 5-fold cross-validation over a dataset of more than 36,000 proteins.
Overall, we show that text features extracted from biomedical literature can be used to predict protein function for any organism. Our results also show that our text-based classifier typically has comparable performance to the sequence-similarity baseline classifier. Based on our results and what previous work had shown, we believe that text features can be integrated with other types of features to provide more accurate predictions for protein function. / Thesis (Master, Computing) -- Queen's University, 2013-04-24 21:07:13.983
|
2 |
Learning 3D structures for protein function predictionMuttakin, Md Nurul 05 1900 (has links)
Machine learning models such as AlphaFold can generate protein 3D conformation
from primary sequence up to experimental accuracy, which gives rise to a
bunch of research works to predict protein functions from 3D structures. Almost
all of these works attempted to use graph neural networks (GNN) to learn 3D
structures of proteins from 2D contact maps/graphs. Most of these works use
rich 1D features such as ESM and LSTM embedding in addition to the contact
graph. These rich 1D features essentially obfuscate the learning capability
of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from
contact map graphs in the existing framework, where we attempt to incorporate
distance information for better predictive performance. We found that GCNs fall
far short with 1D-CNN without language models, even with distance information.
Consequently, we further investigate the capabilities of GCNs to distinguish subgraph
patterns corresponding to the InterPro domains. We found that GCNs
perform better than highly rich sequence embedding with MLP in recognizing
the structural patterns. Finally, we investigate the capability of GCNs to predict
GO-terms (functions) individually. We found that GCNs perform almost
on par in identifying GO-terms in the presence of only hard positive and hard
negative examples. We also identified some GO-terms indistinguishable by GCNs
and ESM2-based MLP models. This gives rise to new research questions to be
investigated by future works.
|
3 |
Predicting Protein Functions From Interactions Using Neural Networks and OntologiesQathan, Shahad 22 November 2022 (has links)
To understand the process of life, it is crucial for us to study proteins and
their functions. Proteins execute (almost) all cellular activities, and their functions are standardized by Gene Ontology (GO). The amount of discovered protein sequences grows rapidly as a consequence of the fast rate of development of
technologies in gene sequencing. In UniProtKB, there are more than 200 million
proteins. Still, less than 1% of the proteins in the UniProtKB database are experimentally GO-annotated, which is the result of the exorbitant cost of biological
experiments. To minimize the large gap, developing an efficient and effective
method for automatic protein function prediction (AFP) is essential.
Many approaches have been proposed to solve the AFP problem. Still, these
methods suffer from limitations in the way the knowledge of the domain is presented and what type of knowledge is included. In this work, we formulate the
task of AFP as an entailment problem and exploit the structure of the related
knowledge in a set and reusable framework. To achieve this goal, we construct a
knowledge base of formal GO axioms and protein-protein interactions to use as
background knowledge for AFP. Our experiments show that the approach proposed here, which allows for ontology awareness, improves results for AFP of
proteins; they also show the importance of including protein-protein interactions for predicting the functions of proteins.
|
4 |
Pattern Oriented Methods for Inferring Protein Annotations within Protein Interaction NetworksKirac, Mustafa January 2009 (has links)
No description available.
|
5 |
Redundancy-aware learning of protein structure-function relationshipsBryant, Drew 13 May 2013 (has links)
The protein kinases are a large family of enzymes that play a fundamental role in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residues, referred to here as substructures, have been shown to be informative of inhibitor selectivity. This thesis introduces two fundamental approaches for the comparative analysis of substructure similarity and demonstrates the importance of each method on a variety of large protein structure datasets for multiple biological applications.
The Family-wise Alignment of SubStructural Templates Framework (The FASST Framework) provides an unsupervised learning approach for identifying substructure clusterings. The substructure clusterings identified by FASST allow for the automatic evaluation of substructure variability, the identification of distinct structural conformations and the selection of anomalous outlier structures within large structure datasets. These clusterings are shown to be capable of identifying biologically meaningful structure trends among a diverse number of protein families. The FASST Live visualization and analysis platform provides multiple comparative analysis pipelines and allows the user to interactively explore the substructure clusterings computed by FASST.
The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method provides a supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. The ability of CCORPS to identify structural features predictive of functional divergence among families of homologous enzymes is demonstrated across 48 distinct protein families. The CCORPS method is further demonstrated to generalize to the very difficult problem of predicting protein kinase inhibitor affinity. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding ability of 12 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors.
Importantly, both The FASST Framework and CCORPS implement a redundancy-aware approach to dealing with structure overrepresentation that allows for the incorporation of all available structure data. As shown in this thesis, surprising structural variability exists even among structure datasets consisting of a single protein sequence. By incorporating the full variety of structural conformations within the analysis, the methods presented here provide a richer view of the variability of large protein structure datasets.
|
6 |
基於資料科學方法之巨量蛋白質功能預測 / Applying Data Science to High-throughput Protein Function Prediction劉義瑋, Liu, Yi-Wei Unknown Date (has links)
自人體基因組計畫與次世代定序的完成後,生物資料呈現爆炸性的成長,其中蛋白質序列也是大量發現的基因產物之一,然而蛋白質的功能檢測與標記極其耗時,因此存在大量已知序列卻不知其功能的蛋白質,在實驗前透過電腦先預測可能之功能,能夠幫助生物學家排定不同的蛋白質功能實驗順序,因而加快蛋白質功能標注的速度。基因本體論(GO)是一個被廣泛使用描述基因產物功能與性質的分類方法,分為生物途徑、細胞組件、分子功能三個分支,每個分支皆為一個由多個GO組成的階層樹。蛋白質功能預測為透過蛋白質序列預測該蛋白質所擁有的GO,因此可以視為一個多標籤的分類機器學習問題。我們提出一個基於序列同源性的機器學習預測框架,同時能夠結合蛋白質家族的資訊,並設計多種不同的投票方法解決多標籤的預測問題。 / Biological data has grown explosively with the accomplishment of Human Genome Project and Next-generation sequencing. Annotating protein function with wet lab experiment is time-consuming, so many proteins’ functions are still unknown. Fortunately, computational function prediction can help wet lab formulate biological hypotheses and prioritize experiments. Gene Ontology (GO) is the framework for unifying the representation of gene function and classifying these functions into three domains namely, Biological Process Ontology, Cellular Component Ontology, and Molecular Function Ontology. Each domain is a hierarchical tree composed of labels known as GO terms. Protein function prediction can be considered as a multiple label classification problem, i.e., given a protein sequence, predict its GO terms. We proposed a machine learning framework to predict protein function based on its homology sequence structure, which is believed to contain protein family information and designed various voting mechanisms to resolve the multiple label prediction problem.
|
7 |
A computational investigation of solubility, functionality and the adaptation in subcellular compartments of proteinsChan, Pedro January 2011 (has links)
A cell is considered to be the smallest unit of life. It carries out a variety of biochemical reactions through the activities of proteins and protein enzymes. In order to perform functions, proteins must be in their native folded state together with the correct environmental conditions. A slight change in pH or temperature could cause disruption to the electrostatic interactions within the protein, thus leading to conformational change and the loss of activity. Studies have shown that solubility could be enhanced by increasing the number of charges on the protein surface. And from the studies of extremophiles, we learned that the presence of non-polar aromatic residues could be a key for thermostable proteins. Thus, charges are important to determine the function and adaptation of proteins.Over the decades, large amount of protein sequence and structure information relating to molecular biology has been produced. By employing algorithms, computational and statistical techniques, it is possible to analyse these data to solve biological problems. Often these investigations are based mainly on sequences since their numbers outstrip the number of available structures. However, adding structures would allow us to investigate problems such as the relationship between charges, sequence, structure and functions, which is the aim of this study.In this thesis, the relationships between proteins and function were examined by various electrostatic features derived from charges and also geometric properties from structures. One interesting finding is that the averaged value of pH of maximum stability of proteins within a subcellular location was highly correlated to the pH of that subcellular compartment, which was due to pKas (of histidines), and their locations on the proteins. We also found that the size of the largest non-charged patch on the protein surface correlates with solubility and provides a predictor with a maximum accuracy of 76%. The use of novel charge-based methods shows little improvement in distinguishing between enzymes and non-enzymes. However, the method of using real charges with grid size of 1 angstrom has paved a way into the idea of using charges and dipoles pattern from enzyme active site to distinguish different enzymes. Finally, a web-tool for displaying conserved residues on 3D protein structure is made available to the public for identifying residues that may be of functional importance.
|
8 |
Machine Learning Approaches Towards Protein Structure and Function PredictionAashish Jain (10933737) 04 August 2021 (has links)
<div>
<div>
<div>
<p>Proteins are drivers of almost all biological processes in the cell. The functions of a protein
are dependent on their three-dimensional structure and elucidating the structure and function of
proteins is key to understanding how a biological system operates. In this research, we developed
computational methods using machine learning techniques to predicts the structure and function
of proteins. Protein 3D structure prediction has advanced significantly in recent years, largely due
to deep learning approaches that predict inter-residue contacts and, more recently, distances using
multiple sequence alignments (MSAs). The performance of these models depends on the number
of similar protein sequences to the query protein, wherein some cases similar sequences are few
but dissimilar sequences with local similarities are more and can be helpful. We have developed a
novel deep learning-based approach AttentiveDist which further improves over the previous state
of art. We added an attention mechanism where dis-similar sequences are also used (increasing
number of sequences) and the model itself determines which information from such sequences it
should attend to. We showed that the improvement of distance predictions was successfully
transferred to achieve better protein tertiary structure modeling. We also show that structure
prediction from a predicted distance map can be further enhanced by using predicted inter-residue
sidechain center distances and main-chain hydrogen-bonds. Protein function prediction is another
avenue we explored where we want to predict the function that a protein will perform. The crux of
the approach is to predict the function of protein based on the function of similar sequences. Here,
we developed a method where we use dissimilar sequences to extract additional information and
improve performance over the previous approaches. We used phylogenetic analysis to determine
if a dissimilar sequence can be close to the query sequence and thus can provide functional
information. Our method was ranked highly in worldwide protein function prediction competition CAFA3 (2016-2019). Further, we expanded the method with a neural network to predict protein
toxicity that can be used as a safety check for human-designed protein sequences.</p></div></div></div>
|
9 |
Vyhledávání příbuzných proteinů s modifikovanou funkcí / Detection of Related Proteins with Modified FunctionHon, Jiří January 2015 (has links)
Protein engineering is a young dynamic discipline with great amount of potential practical applications. However, its success is primarily based on perfect knowledge and usage of all existing information about protein function and structure. To achieve that, protein engineering is supported by plenty of bioinformatic tools and analysis. The goal of this project is to create a new tool for protein engineering that would enable researchers to identificate related proteins with modified function in still growing biological databases. The tool is designed as an automated workflow of existing bioinformatic analyses that leads to identification of proteins with the same type of enzymatic function, but with slightly modified properties - primarily in terms of selectivity, reaction speed and stability.
|
10 |
Protein Function Prediction Using Decision Tree TechniqueYedida, Venkata Rama Kumar Swamy 02 September 2008 (has links)
No description available.
|
Page generated in 0.14 seconds