Return to search

De novo genome-scale prediction of protein-protein interaction networks using ontology-based background knowledge

Proteins and their function play one of the most essential roles in various biological processes. The study of PPI is of considerable importance. PPI network data are of great scientific value, however, they are incomplete and experimental identification is time and money consuming. Available computational methods perform well on model organisms’ PPI prediction but perform poorly for a novel organism. Due to the incompleteness of interaction data, it is challenging to train a model for a novel organism. Also, millions to billions of interactions need to be verified which is extremely compute-intensive.
We aim to improve the performance of predicting whether a pair of proteins will interact, with only two sequences as input. And also efficiently predict a PPI network with a proteome of sequences as input.
We hypothesize that information about cellular locations where proteins are
active and proteins' 3D structures can help us to significantly improve predict performance.
To overcome the lack of experimental data, we use predicted structures by AlphaFold2 and cellular locations by DeepGoPlus.
We believe that proteins belonging to disjoint biological components have very little chance to interact. We manually choose several disjoint pairs and further confirmed it by experimental PPI.
We generate new no-interaction pairs with disjoint classes to update the D-SCRIPT dataset. As result, the AUPR has improved by 10% compared to the D-SCRIPT dataset. Besides, we pre-filter the negatives instead of enumerating all the potential PPI for de-novo PPI network prediction. For E.coli, we can pass around a million negative interactions.
To combine the structure and sequence information, we generate a graph for each protein. A graph convolution network using Self-Attention Graph Pooling in Siamese architecture is used to learn these graphs for PPI prediction. In this way, we can improve around 20% in AUPR compared to our baseline model D-SCRIPT.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/679772
Date18 July 2022
CreatorsNiu, Kexin
ContributorsHoehndorf, Robert, Biological and Environmental Science and Engineering (BESE) Division, Inal, Sahika, Moshkov, Mikhail
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights2023-07-21, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2023-07-21.

Page generated in 0.0024 seconds