Return to search

Prediction of Protein-protein Interactions and Essential Genes through Data Integration

The currently known network of human protein-protein interactions (PPIs) is providing new insights into diseases and helping to identify potential therapies. However, according to several estimates, the known interaction network may represent only 10% of the entire interactome - indicating that more comprehensive knowledge of the interactome could have a major impact on understanding and treating diseases. The primary aim of this thesis was to develop computational methods to provide increased coverage of the interactome. A secondary aim was to gain a better understanding of the link between networks and phenotype, by analyzing essential mouse genes.

Two algorithms were developed to predict PPIs and provide increased coverage of the interactome: FpClass and mixed co-expression. FpClass differs from previous PPI prediction methods in two key ways: it integrates both positive and negative evidence for protein interactions, and it identifies synergies between predictive features. Through these approaches FpClass provides interaction networks with significantly improved reliability and interactome coverage. Compared to previous predicted human PPI networks, FpClass provides a network with over 10 times more interactions, about 2 times more proteins and a lower false discovery rate. This network includes 595 disease related proteins from OMIM and Cancer Gene Census which have no previously known interactions. The second method, mixed co-expression, aims to predict transient PPIs, which have proven difficult to detect by computational and experimental methods. Mixed co-expression makes predictions using gene co-expression and performs significantly better (p < 0.05) than the previous method for predicting PPIs from co-expression. It is especially effective for identifying interactions of transferases and signal transduction proteins.

For the second aim of the thesis, we investigated the relationship between gene essentiality and diverse gene/protein features based on gene expression, PPI and gene co-expression networks, gene/protein sequence, Gene Ontology, and orthology. We identified non-redundant features closely associated with essentiality, including centrality in PPI and gene co-expression networks. We found that no single predictive feature was effective for all essential genes; most features, including centrality, were less effective for genes associated with postnatal lethality and infertility. These results suggest that understanding phenotype will require integrating measures of network topology with information about the biology of the network’s nodes and edges.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/29776
Date31 August 2011
CreatorsKotlyar, Max
ContributorsJurisica, Igor
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0015 seconds