Global ETD Search

Return to search

Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry

Tandem mass spectrometry (MS/MS) is the dominant approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. The computational analysis of MS/MS spectra involves the identification of peptides from experimental spectra, especially those with post-translational modifications (PTMs), as well as the inference of protein composition based on the putative identified peptides. In this thesis, we tackled two major challenges associated with an MS/MS analysis: 1) the refinement of PTM predictions from MS/MS spectra and 2) the inference of protein composition based on peptide predictions. We proposed two PTM prediction refinement algorithms, PTMClust and its Bayesian nonparametric extension \emph{i}PTMClust, and a protein identification algorithm, pro-HAP, that is based on a novel two-layer hierarchical clustering approach that leverages prior knowledge about protein function. Individually, we show that our two PTM refinement algorithms outperform the state-of-the-art algorithms and our protein identification algorithm performs at par with the state of the art. Collectively, as a demonstration of our end-to-end MS/MS computational analysis of a human chromatin protein complex study, we show that our analysis pipeline can find high confidence putative novel protein complex members. Moreover, it can provide valuable insights into the formation and regulation of protein complexes by detailing the specificity of different PTMs for the members in each complex.

http://hdl.handle.net/1807/33965

Machine Learning

Unsupervised Learning

Clustering

Mass Spectrometry

Protein Identification

Nonparameteric Bayesian method

Hierarchical clustering

0800

0984

0715

Identifer	oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/33965
Date	11 December 2012
Creators	Chung, Clement
Contributors	Frey, Brendan J.
Source Sets	University of Toronto
Language	en_ca
Detected Language	English
Type	Thesis

Page generated in 0.0021 seconds

Machine Learning Approaches to Refining Post-translational Modification Predictions and Protein Identifications from Tandem Mass Spectrometry

Description

Links & Downloads

Tags

Additional Fields