Technological advancements in mass spectrometry allowing quantification of almost complete proteomes make proteomics a key platform for generating unique functional molecular data. Furthermore, the integrative analysis of genomic and proteomic data, termed proteogenomics, has emerged as a new field revealing insights into gene expression regulation, cell signalling, and disease processes. However, the lack of software tools for high-throughput integration and unbiased modification and variant detection hinder efforts for large-scale proteogenomics studies. The main objectives of this work are to address these issues by developing and applying new software tools and data analysis methods. Firstly, I address mapping of peptide sequences to reference genomes. I introduce a novel tool for high-throughput mapping and highlight its unique features facilitating quantitative and post-translational modification mapping alongside accounting for amino acid substitutions. The performance is benchmarked. Furthermore, I offer an additional tool that permits generation of web accessible hubs of genome wide mappings. To enable unbiased identification of post-translational modifications and amino acid substitutions for high resolution mass spectrometry data, I present algorithmic updates the mass tolerant blind spectrum comparison tool ’MS SMiV’. I demonstrate the applicability of the changes by benchmarking against a published mass tolerant database search of a high resolution tandem mass spectrometry dataset. I then present the application of ‘MS SMiV’ on a panel of 50 colorectal cancer cell lines. I show that the adaption of ‘MS SMiV’ outperforms traditional sequence database based identification of single amino acid variants. Furthermore, I highlight the utility of mass tolerant spectrum matching in combination with isobaric labelled quantitative proteomics in distinguishing between post-translational modifications and amino acid variants of similar mass. In the last part of this work I integrate both tools with a high-throughput proteogenomic identification pipeline and apply it to a pilot study of chondrocytes derived from 12 osteoarthritic individuals. I show the value of this approach in identifying variation between individuals and molecular levels and highlight them with individual examples. I show that multi-plexed proteogenomics can be used to infer genotypes of individuals.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:744743 |
Date | January 2018 |
Creators | Schlaffner, Christoph Norbert |
Contributors | Bender, Andreas ; Choudhary, Jyoti |
Publisher | University of Cambridge |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://www.repository.cam.ac.uk/handle/1810/275137 |
Page generated in 0.0025 seconds