Return to search

Learning 3D structures for protein function prediction

Machine learning models such as AlphaFold can generate protein 3D conformation
from primary sequence up to experimental accuracy, which gives rise to a
bunch of research works to predict protein functions from 3D structures. Almost
all of these works attempted to use graph neural networks (GNN) to learn 3D
structures of proteins from 2D contact maps/graphs. Most of these works use
rich 1D features such as ESM and LSTM embedding in addition to the contact
graph. These rich 1D features essentially obfuscate the learning capability
of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from
contact map graphs in the existing framework, where we attempt to incorporate
distance information for better predictive performance. We found that GCNs fall
far short with 1D-CNN without language models, even with distance information.
Consequently, we further investigate the capabilities of GCNs to distinguish subgraph
patterns corresponding to the InterPro domains. We found that GCNs
perform better than highly rich sequence embedding with MLP in recognizing
the structural patterns. Finally, we investigate the capability of GCNs to predict
GO-terms (functions) individually. We found that GCNs perform almost
on par in identifying GO-terms in the presence of only hard positive and hard
negative examples. We also identified some GO-terms indistinguishable by GCNs
and ESM2-based MLP models. This gives rise to new research questions to be
investigated by future works.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/691654
Date05 1900
CreatorsMuttakin, Md Nurul
ContributorsHoehndorf, Robert, Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Ombao, Hernando, Elhoseiny, Mohamed
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights2024-05-11, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-05-11.
RelationN/A

Page generated in 0.0026 seconds