Machine learning models such as AlphaFold can generate protein 3D conformation
from primary sequence up to experimental accuracy, which gives rise to a
bunch of research works to predict protein functions from 3D structures. Almost
all of these works attempted to use graph neural networks (GNN) to learn 3D
structures of proteins from 2D contact maps/graphs. Most of these works use
rich 1D features such as ESM and LSTM embedding in addition to the contact
graph. These rich 1D features essentially obfuscate the learning capability
of GNNs. In this thesis, we evaluate the learning capabilities of GCNs from
contact map graphs in the existing framework, where we attempt to incorporate
distance information for better predictive performance. We found that GCNs fall
far short with 1D-CNN without language models, even with distance information.
Consequently, we further investigate the capabilities of GCNs to distinguish subgraph
patterns corresponding to the InterPro domains. We found that GCNs
perform better than highly rich sequence embedding with MLP in recognizing
the structural patterns. Finally, we investigate the capability of GCNs to predict
GO-terms (functions) individually. We found that GCNs perform almost
on par in identifying GO-terms in the presence of only hard positive and hard
negative examples. We also identified some GO-terms indistinguishable by GCNs
and ESM2-based MLP models. This gives rise to new research questions to be
investigated by future works.
Identifer | oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/691654 |
Date | 05 1900 |
Creators | Muttakin, Md Nurul |
Contributors | Hoehndorf, Robert, Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Ombao, Hernando, Elhoseiny, Mohamed |
Source Sets | King Abdullah University of Science and Technology |
Language | English |
Detected Language | English |
Type | Thesis |
Rights | 2024-05-11, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-05-11. |
Relation | N/A |
Page generated in 0.0018 seconds