Return to search

A BIOINFORMATIC TOOL FOR ANALYSING THE STRUCTURES OF PROTEIN COMPLEXES BY MEANS OF MASS SPECTROMETRY OF CROSS-LINKED PROTEINS

Multi-subunit protein complexes are involved in many essential biochemical processes
including signal transduction, protein synthesis, RNA synthesis, DNA replication and
protein degradation. An accurate description of the relative structural arrangement of
the constituent sub-units in such complexes is crucial for an understanding of the
molecular mechanism of the complex as a whole. Many complexes, however, lie in the
mega-Dalton range, and are not amenable to X-ray crystallographic or Nuclear
Magnetic Resonance analysis. Techniques that are suited to structural studies of such
large complexes, such as cryo-electron microscopy, do not provide the resolution
required for a mechanistic insight.
Mass spectrometry (MS) has increasingly been applied to identify the residues that are
involved in chemical cross-links in compound protein assemblies, and have provided
valuable insight into the molecular arrangement, orientation and contact surfaces of
sub-units within such large complexes. This approach is known as MS3D, and
involves the MS analysis of cross-linked di-peptides following the enzymatic cleavage
of a chemically cross-linked complex. A major challenge of this approach is the
identification of the cross-linked di-peptides in a composite mixture of peptides, as well
as the identification of the residues involved in the cross-link. These analyses require
bioinformatics tools with capabilities beyond that of general, MS-based proteomic
analysis software. Many MS3D software tools have appeared, often designed for very
specific experimental methods. We review all major MS3D bioinformatics programs
currently available, considering their applicability to different workflows, specific
experimental requirements, and the computational approach taken by each. We also developed AnchorMS, a new bioinformatics tool for the identification of both
the sequences and cross-linked residues of di-peptides within a post-digest peptide
mixture based on MS1 and MS2 data. AnchorMS is intended as a component in the
workflow of an MS3D experiment where the protein sequences, cross-linking reagent
and protease are known.
AnchorMS is freely available as a public web service at cbio.ufs.ac.za/AnchorMS via a
simple, user-friendly web interface coded in PHP/XHTML. Experimental sample
preparation information and MS data may be uploaded through the web form and
analysed by AnchorMS. After analysis, the web interface displays the di-peptides
detected, as well as the calculated maximum inter-residue distance between crosslinked
residues. This distance information can be used in the optimization of sub-unit
positioning within structural models using third party software.
The computational core of AnchorMS was developed as an open-source Python
project. We describe in detail the overall structure and workflow of the code as well as
the functionality implemented in each section of the code.
AnchorMS creates a digital library of possible di-peptides and generates expected
precursor and fragment mass spectra for each. In order to identify di-peptides, the
observed mass spectra are matched against the library of expected mass spectra.
Features that are unique to AnchorMS are highlighted, including those for the analysis
of di-peptides where the sequences are identical, but the cross-linked residues differ.
AnchorMS considers their possible co-fragmentation and employs a specialised
second score for distinguishing between such precursors.
A unique mathematical model for estimating the level of false positive matching was
derived based on an in silico simulation of false positive spectrum matching using
randomly generated di-peptide sequences. Subsets of the simulation data were
modelled using disparate functions, which were subsequently combined to yield a
composite model that described expected false matching under various conditions.
The refined calibration of this model against simulation data was performed using the R
programming language. AnchorMS also implemented this model as a dynamic false
positive threshold, where score values greater than the threshold were considered
likely to be true spectrum matches.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:ufs/oai:etd.uovs.ac.za:etd-08072014-104720
Date07 August 2014
CreatorsMayne, Shannon LN
ContributorsProf H Patterton
PublisherUniversity of the Free State
Source SetsSouth African National ETD Portal
Languageen-uk
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.uovs.ac.za//theses/available/etd-08072014-104720/restricted/
Rightsunrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to University Free State or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.0019 seconds