Return to search

Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry

Tandem mass spectrometry (MS/MS) is the dominant approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. The computational analysis of MS/MS spectra involves the identification of peptides from experimental spectra, especially those with post-translational modifications (PTMs), as well as the inference of protein composition based on the putative identified peptides. In this thesis, we tackled two major challenges associated with an MS/MS analysis: 1) the refinement of PTM predictions from MS/MS spectra and 2) the inference of protein composition based on peptide predictions. We proposed two PTM prediction refinement algorithms, PTMClust and its Bayesian nonparametric extension \emph{i}PTMClust, and a protein identification algorithm, pro-HAP, that is based on a novel two-layer hierarchical clustering approach that leverages prior knowledge about protein function. Individually, we show that our two PTM refinement algorithms outperform the state-of-the-art algorithms and our protein identification algorithm performs at par with the state of the art. Collectively, as a demonstration of our end-to-end MS/MS computational analysis of a human chromatin protein complex study, we show that our analysis pipeline can find high confidence putative novel protein complex members. Moreover, it can provide valuable insights into the formation and regulation of protein complexes by detailing the specificity of different PTMs for the members in each complex.

Identiferoai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/33965
Date11 December 2012
CreatorsChung, Clement
ContributorsFrey, Brendan J.
Source SetsUniversity of Toronto
Languageen_ca
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds