• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 15
  • 15
  • 15
  • 7
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

PREDICTION OF PROTEIN FUNCTION USING TEXT FEATURES EXTRACTED FROM THE BIOMEDICAL LITERATURE

Wong, ANDREW 25 April 2013 (has links)
Proteins perform many important functions in the cell and are essential to the health of the cell and the organism. As such, there is much effort to understand the function of proteins. Due to the advances in sequencing technology, there are many sequences of proteins whose function is yet unknown. Therefore, computational systems are being developed and used to help predict protein function. Most computational systems represent proteins using features that are derived from protein sequence or protein structure to predict function. In contrast, there are very few systems that use the biomedical literature as a source of features. Earlier work demonstrated the utility of biomedical literature as a source of text features for predicting protein subcellular location. In this thesis we build on that earlier work, and examine the effectiveness of using text features to predict protein function. Using the molecular function and biological process terms from the Gene Ontology (GO) as our function classes, we trained two classifiers (k-Nearest Neighbour and Support Vector Machines) to predict protein function. The proteins were represented using text features that were extracted from biomedical abstracts based on statistical properties. For evaluation, the performance of our two classifiers was compared to that of two baseline classifiers: one that assigns function based solely on the prior distribution of protein function, and one that assigns function based on sequence similarity. The systems were trained and tested using 5-fold cross-validation over a dataset of more than 36,000 proteins. Overall, we show that text features extracted from biomedical literature can be used to predict protein function for any organism. Our results also show that our text-based classifier typically has comparable performance to the sequence-similarity baseline classifier. Based on our results and what previous work had shown, we believe that text features can be integrated with other types of features to provide more accurate predictions for protein function. / Thesis (Master, Computing) -- Queen's University, 2013-04-24 21:07:13.983
2

Learning 3D structures for protein function prediction

Muttakin, Md Nurul 05 1900 (has links)
Machine learning models such as AlphaFold can generate protein 3D conformation from primary sequence up to experimental accuracy, which gives rise to a bunch of research works to predict protein functions from 3D structures. Almost all of these works attempted to use graph neural networks (GNN) to learn 3D structures of proteins from 2D contact maps/graphs. Most of these works use rich 1D features such as ESM and LSTM embedding in addition to the contact graph. These rich 1D features essentially obfuscate the learning capability of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from contact map graphs in the existing framework, where we attempt to incorporate distance information for better predictive performance. We found that GCNs fall far short with 1D-CNN without language models, even with distance information. Consequently, we further investigate the capabilities of GCNs to distinguish subgraph patterns corresponding to the InterPro domains. We found that GCNs perform better than highly rich sequence embedding with MLP in recognizing the structural patterns. Finally, we investigate the capability of GCNs to predict GO-terms (functions) individually. We found that GCNs perform almost on par in identifying GO-terms in the presence of only hard positive and hard negative examples. We also identified some GO-terms indistinguishable by GCNs and ESM2-based MLP models. This gives rise to new research questions to be investigated by future works.
3

Pattern Oriented Methods for Inferring Protein Annotations within Protein Interaction Networks

Kirac, Mustafa January 2009 (has links)
No description available.
4

Predicting Protein Functions From Interactions Using Neural Networks and Ontologies

Qathan, Shahad 22 November 2022 (has links)
To understand the process of life, it is crucial for us to study proteins and their functions. Proteins execute (almost) all cellular activities, and their functions are standardized by Gene Ontology (GO). The amount of discovered protein sequences grows rapidly as a consequence of the fast rate of development of technologies in gene sequencing. In UniProtKB, there are more than 200 million proteins. Still, less than 1% of the proteins in the UniProtKB database are experimentally GO-annotated, which is the result of the exorbitant cost of biological experiments. To minimize the large gap, developing an efficient and effective method for automatic protein function prediction (AFP) is essential. Many approaches have been proposed to solve the AFP problem. Still, these methods suffer from limitations in the way the knowledge of the domain is presented and what type of knowledge is included. In this work, we formulate the task of AFP as an entailment problem and exploit the structure of the related knowledge in a set and reusable framework. To achieve this goal, we construct a knowledge base of formal GO axioms and protein-protein interactions to use as background knowledge for AFP. Our experiments show that the approach proposed here, which allows for ontology awareness, improves results for AFP of proteins; they also show the importance of including protein-protein interactions for predicting the functions of proteins.
5

Redundancy-aware learning of protein structure-function relationships

Bryant, Drew 13 May 2013 (has links)
The protein kinases are a large family of enzymes that play a fundamental role in propagating signals within the cell. Because of the high degree of binding site similarity shared among protein kinases, designing drug compounds with high specificity among the kinases has proven difficult. However, computational approaches to comparing the 3-dimensional geometry and physicochemical properties of key binding site residues, referred to here as substructures, have been shown to be informative of inhibitor selectivity. This thesis introduces two fundamental approaches for the comparative analysis of substructure similarity and demonstrates the importance of each method on a variety of large protein structure datasets for multiple biological applications. The Family-wise Alignment of SubStructural Templates Framework (The FASST Framework) provides an unsupervised learning approach for identifying substructure clusterings. The substructure clusterings identified by FASST allow for the automatic evaluation of substructure variability, the identification of distinct structural conformations and the selection of anomalous outlier structures within large structure datasets. These clusterings are shown to be capable of identifying biologically meaningful structure trends among a diverse number of protein families. The FASST Live visualization and analysis platform provides multiple comparative analysis pipelines and allows the user to interactively explore the substructure clusterings computed by FASST. The Combinatorial Clustering Of Residue Position Subsets (CCORPS) method provides a supervised learning approach for identifying structural features that are correlated with a given set of annotation labels. The ability of CCORPS to identify structural features predictive of functional divergence among families of homologous enzymes is demonstrated across 48 distinct protein families. The CCORPS method is further demonstrated to generalize to the very difficult problem of predicting protein kinase inhibitor affinity. CCORPS is demonstrated to make perfect or near-perfect predictions for the binding ability of 12 of the 38 kinase inhibitors studied, while only having overall poor predictive ability for 1 of the 38 compounds. Additionally, CCORPS is shown to identify shared structural features across phylogenetically diverse groups of kinases that are correlated with binding affinity for particular inhibitors; such instances of structural similarity among phylogenetically diverse kinases are also shown to not be rare among kinases. Finally, these function-specific structural features may serve as potential starting points for the development of highly specific kinase inhibitors. Importantly, both The FASST Framework and CCORPS implement a redundancy-aware approach to dealing with structure overrepresentation that allows for the incorporation of all available structure data. As shown in this thesis, surprising structural variability exists even among structure datasets consisting of a single protein sequence. By incorporating the full variety of structural conformations within the analysis, the methods presented here provide a richer view of the variability of large protein structure datasets.
6

基於資料科學方法之巨量蛋白質功能預測 / Applying Data Science to High-throughput Protein Function Prediction

劉義瑋, Liu, Yi-Wei Unknown Date (has links)
自人體基因組計畫與次世代定序的完成後,生物資料呈現爆炸性的成長,其中蛋白質序列也是大量發現的基因產物之一,然而蛋白質的功能檢測與標記極其耗時,因此存在大量已知序列卻不知其功能的蛋白質,在實驗前透過電腦先預測可能之功能,能夠幫助生物學家排定不同的蛋白質功能實驗順序,因而加快蛋白質功能標注的速度。基因本體論(GO)是一個被廣泛使用描述基因產物功能與性質的分類方法,分為生物途徑、細胞組件、分子功能三個分支,每個分支皆為一個由多個GO組成的階層樹。蛋白質功能預測為透過蛋白質序列預測該蛋白質所擁有的GO,因此可以視為一個多標籤的分類機器學習問題。我們提出一個基於序列同源性的機器學習預測框架,同時能夠結合蛋白質家族的資訊,並設計多種不同的投票方法解決多標籤的預測問題。 / Biological data has grown explosively with the accomplishment of Human Genome Project and Next-generation sequencing. Annotating protein function with wet lab experiment is time-consuming, so many proteins’ functions are still unknown. Fortunately, computational function prediction can help wet lab formulate biological hypotheses and prioritize experiments. Gene Ontology (GO) is the framework for unifying the representation of gene function and classifying these functions into three domains namely, Biological Process Ontology, Cellular Component Ontology, and Molecular Function Ontology. Each domain is a hierarchical tree composed of labels known as GO terms. Protein function prediction can be considered as a multiple label classification problem, i.e., given a protein sequence, predict its GO terms. We proposed a machine learning framework to predict protein function based on its homology sequence structure, which is believed to contain protein family information and designed various voting mechanisms to resolve the multiple label prediction problem.
7

A computational investigation of solubility, functionality and the adaptation in subcellular compartments of proteins

Chan, Pedro January 2011 (has links)
A cell is considered to be the smallest unit of life. It carries out a variety of biochemical reactions through the activities of proteins and protein enzymes. In order to perform functions, proteins must be in their native folded state together with the correct environmental conditions. A slight change in pH or temperature could cause disruption to the electrostatic interactions within the protein, thus leading to conformational change and the loss of activity. Studies have shown that solubility could be enhanced by increasing the number of charges on the protein surface. And from the studies of extremophiles, we learned that the presence of non-polar aromatic residues could be a key for thermostable proteins. Thus, charges are important to determine the function and adaptation of proteins.Over the decades, large amount of protein sequence and structure information relating to molecular biology has been produced. By employing algorithms, computational and statistical techniques, it is possible to analyse these data to solve biological problems. Often these investigations are based mainly on sequences since their numbers outstrip the number of available structures. However, adding structures would allow us to investigate problems such as the relationship between charges, sequence, structure and functions, which is the aim of this study.In this thesis, the relationships between proteins and function were examined by various electrostatic features derived from charges and also geometric properties from structures. One interesting finding is that the averaged value of pH of maximum stability of proteins within a subcellular location was highly correlated to the pH of that subcellular compartment, which was due to pKas (of histidines), and their locations on the proteins. We also found that the size of the largest non-charged patch on the protein surface correlates with solubility and provides a predictor with a maximum accuracy of 76%. The use of novel charge-based methods shows little improvement in distinguishing between enzymes and non-enzymes. However, the method of using real charges with grid size of 1 angstrom has paved a way into the idea of using charges and dipoles pattern from enzyme active site to distinguish different enzymes. Finally, a web-tool for displaying conserved residues on 3D protein structure is made available to the public for identifying residues that may be of functional importance.
8

Machine Learning Approaches Towards Protein Structure and Function Prediction

Aashish Jain (10933737) 04 August 2021 (has links)
<div> <div> <div> <p>Proteins are drivers of almost all biological processes in the cell. The functions of a protein are dependent on their three-dimensional structure and elucidating the structure and function of proteins is key to understanding how a biological system operates. In this research, we developed computational methods using machine learning techniques to predicts the structure and function of proteins. Protein 3D structure prediction has advanced significantly in recent years, largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). The performance of these models depends on the number of similar protein sequences to the query protein, wherein some cases similar sequences are few but dissimilar sequences with local similarities are more and can be helpful. We have developed a novel deep learning-based approach AttentiveDist which further improves over the previous state of art. We added an attention mechanism where dis-similar sequences are also used (increasing number of sequences) and the model itself determines which information from such sequences it should attend to. We showed that the improvement of distance predictions was successfully transferred to achieve better protein tertiary structure modeling. We also show that structure prediction from a predicted distance map can be further enhanced by using predicted inter-residue sidechain center distances and main-chain hydrogen-bonds. Protein function prediction is another avenue we explored where we want to predict the function that a protein will perform. The crux of the approach is to predict the function of protein based on the function of similar sequences. Here, we developed a method where we use dissimilar sequences to extract additional information and improve performance over the previous approaches. We used phylogenetic analysis to determine if a dissimilar sequence can be close to the query sequence and thus can provide functional information. Our method was ranked highly in worldwide protein function prediction competition CAFA3 (2016-2019). Further, we expanded the method with a neural network to predict protein toxicity that can be used as a safety check for human-designed protein sequences.</p></div></div></div>
9

Vyhledávání příbuzných proteinů s modifikovanou funkcí / Detection of Related Proteins with Modified Function

Hon, Jiří January 2015 (has links)
Protein engineering is a young dynamic discipline with great amount of potential practical applications. However, its success is primarily based on perfect knowledge and usage of all existing information about protein function and structure. To achieve that, protein engineering is supported by plenty of bioinformatic tools and analysis. The goal of this project is to create a new tool for protein engineering that would enable researchers to identificate related proteins with modified function in still growing biological databases. The tool is designed as an automated workflow of existing bioinformatic analyses that leads to identification of proteins with the same type of enzymatic function, but with slightly modified properties - primarily in terms of selectivity, reaction speed and stability.
10

An Interdisciplinary Approach: Computational Sequence Motif Search and Prediction of Protein Function with Experimental Validation

Choi, Hyunjin 29 October 2013 (has links)
Pathogens colonize their hosts by releasing molecules that can enter host cells. A biotrophic oomycete plant pathogen, Phytophthora sojae harbors a superfamily of effector genes whose protein products enter the cells of the host, soybean. Many of the effectors contain an RXLR-dEER motif in their N-terminus. More than 400 members belonging to this family have been previously identified using a Hidden Markov Model. Amino acids flanking the RXLR motif have been utilized to identify effector proteins from the P. sojae secretome, despite the high level of sequence divergence among the members of this protein family. I present here machine learning methods to identify protein candidates that belong to a particular class, such as the effector superfamily. Converting the flanking amino acid sequences of RXLR motifs (or other candidate motifs) into numeric values that reflect their physical properties enabled the protein sequences to be analyzed through these methods. The methods evaluated include Support Vector Machines and a related spherical classification method that I have developed. I also approached the effector prediction problem by building functional linkage networks and have produced lists of predicted P. sojae effector proteins. I tested the best candidate through gene gun bombardment assays using the beta-glucuronidase reporter system, which revealed that there is a high likelihood that the candidate can enter the soybean cells. / Ph. D.

Page generated in 0.1823 seconds