Many protein-protein interactions, especially those involved in eukaryotic signalling, are mediated by PDZ domains through the recognition of hydrophobic C-termini. The availability of experimental PDZ interaction data sets have led to the construction of computational methods to predict PDZ domain-peptide interactions. Such predictors are ideally suited to predict interactions in single organisms or for limited subsets of PDZ domains. As a result, the goal of my thesis has been to build general predictors that can be used to scan the proteomes of multiple organisms for ligands for almost all PDZ domains from select model organisms. A framework consisting of four steps: data collection, feature encoding, predictor training and evaluation was developed and applied for all predictors built in this thesis.
The first predictor utilized PDZ domain-peptide sequence information from two interaction data sets obtained from high throughput protein microarray and phage display experiments in mouse and human, respectively. The second predictor used PDZ domain structure and peptide sequence information. I showed that these predictors are complementary to each other, are capable of predicting unseen interactions and can be used for the purposes of proteome scanning in human, worm and fly. As both positive and negative interactions are required for building a successful predictor, a major obstacle I addressed was the generation of artificial negative interactions for training. In particular, I used position weight matrices to generate such negatives for the positive only phage display data and used a semi-supervised learning approach to overcome the problem of over-prediction (i.e. prediction of too many positives). These predictors are available as a community web resource: http://webservice.baderlab.org/domains/POW. Finally, a Bayesian integration method combining information from different biological evidence sources was used to filter the human proteome scanning predictions from both predictors. This resulted in the construction of a comprehensive physiologically relevant high confidence PDZ mediated protein-protein interaction network in human.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/43599 |
Date | 09 January 2014 |
Creators | Hui, Shirley |
Contributors | Bader, Gary D. |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0018 seconds