Global ETD Search

1	Filtering Methods for Mass Spectrometry-based Peptide Identification Processes 2013 October 1900 (has links) Tandem mass spectrometry (MS/MS) is a powerful tool for identifying peptide sequences. In a typical experiment, incorrect peptide identifications may result due to noise contained in the MS/MS spectra and to the low quality of the spectra. Filtering methods are widely used to remove the noise and improve the quality of the spectra before the subsequent spectra identification process. However, existing filtering methods often use features and empirically assigned weights. These weights may not reflect the reality that the contribution (reflected by weight) of each feature may vary from dataset to dataset. Therefore, filtering methods that can adapt to different datasets have the potential to improve peptide identification results. This thesis proposes two adaptive filtering methods; denoising and quality assessment, both of which improve efficiency and effectiveness of peptide identification. First, the denoising approach employs an adaptive method for picking signal peaks that is more suitable for the datasets of interest. By applying the approach to two tandem mass spectra datasets, about 66% of peaks (likely noise peaks) can be removed. The number of peptides identified later by peptide identification on those datasets increased by 14% and 23%, respectively, compared to previous work (Ding et al., 2009a). Second, the quality assessment method estimates the probabilities of spectra being high quality based on quality assessments of the individual features. The probabilities are estimated by solving a constraint optimization problem. Experimental results on two datasets illustrate that searching only the high-quality tandem spectra determined using this method saves about 56% and 62% of database searching time and loses 9% of high-quality spectra. Finally, the thesis suggests future research directions including feature selection and clustering of peptides. Tandem mass spectrometry peptide identification denoise quality assessment.
2	Improved Neuropeptide Identification : Bioinformatics and Mass Spectrometry Fälth Savitski, Maria January 2008 (has links) Bioinformatic methods were developed for improved identification of endogenous peptides using mass spectrometry. As a framework for these methods, a database for endogenous peptides, SwePep, was created. It was designed for storing information about endogenous peptides including tandem mass spectra. SwePep can be used for identification and validation of endogenous peptides by comparing experimentally derived masses of peptides and their fragments with information in the database. To improve automatic peptide identification of neuropeptides, targeted sequence collections that better mimic the peptidomic sample was derived from the SwePep database. Three sequence collections were created: SwePep precursors, SwePep peptides, and SwePep predicted. The searches for neuropeptides performed against these three sequence collections were compared with searches performed against the entire mouse proteome, and it was observed that three times as many peptides were identified with the targeted SwePep sequence collections. Applying the targeted SwePep sequence collections to identification of previously uncharacterized peptides yielded 27 novel potentially bioactive neuropeptides. Two fragmentations studies were performed using high mass accuracy tandem mass spectra of tryptic peptides. For this purpose, two databases were created: SwedCAD and SwedECD for CID and ECD tandem mass spectra, respectively. In the first study, fragmentation pattern of peptides with missed cleaved sites was studied using SwedCAD. It was observed that peptides with two arginines positioned next to each other have the same ability to immobilize two protons as peptides with two distant arginines. In the second study, SwedECD was used for studying small neutral losses from the reduced species in ECD fragmentation. The neutral losses were characterized with regard to their specificity and sensitivity to function as reporter ions for revealing the presence of specific amino acids in the peptide sequence. The results from these two studies can be used to improve identification of both tryptic and endogenous peptides. In summary, a collection of methods was developed that greatly improved the sensitivity of mass spectrometry peptide identification. bioinformatics neuropepides database peptide identification peptide fragmentation mass spectrometry tandem mass spectromerty Bioinformatics Bioinformatik
3	Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry He, Lin January 2013 (has links) The emergence of tandem mass spectrometry (MS/MS) technology has significantly accelerated protein identification and quantification in proteomics. It enables high-throughput analysis of proteins and their quantities in a complex protein mixture. A mass spectrometer can easily and rapidly generate large volumes of mass spectral data for a biological sample. This bulk of data makes manual interpretation impossible and has also brought numerous challenges in automated data analysis. Algorithmic solutions have been proposed and provide indispensable analytical support in current proteomic experiments. However, new algorithms are still needed to either improve result accuracy or provide additional data analysis capabilities for both protein identification and quantification. Accurate identification of proteins in a sample is the preliminary requirement of a proteomic study. In many cases, a mass spectrum cannot provide complete information to identify the peptide without ambiguity because of the inefficiency of the peptide fragmentation technique and the prevalent existence of noise. We propose ADEPTS to this problem using the complementary information provided in different types of mass spectra. Meanwhile, the occurrence of posttranslational modifications (PTMs) on proteins is another major issue that prevents the interpretation of a large portion of spectra. Using current software tools, users have to specify possible PTMs in advance. However, the number of possible PTMs has to be limited since specifying more PTMs to the software leads to a longer running time and lower result accuracy. Thus, we develop DeNovoPTM and PeaksPTM to provide efficient and accurate solutions. Glycosylation is one of the most frequently observed PTMs in proteomics. It plays important roles in many disease processes and thus has attracted growing research interest. However, lack of algorithms that can identify intact glycopeptides has become the major obstacle that hinders glycoprotein studies. We propose a novel algorithm, GlycoMaster DB, to fulfil this urgent requirement. Additional research is presented on protein quantification, which studies the changes of protein quantity by comparing two or more mass spectral datasets. A crucial problem in the quantification is to correct the retention time distortions between different datasets. Heuristic solutions from previous research have been used in practice but none of them has yet claimed a clear optimization goal. To address this issue, we propose a combinatorial model and practical algorithms for this problem. Bioinformatics Computational biology Mass spectrometry Protein/peptide identification Protein quantification Posttranslational modification Glycosylation Computer Science
4	Eﬀective Strategies for Improving Peptide Identiﬁcation with Tandem Mass Spectrometry Han, Xi January 2011 (has links) Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from protein mixtures in the field of proteomics. However, only about 30% to 40% of current MS/MS spectra can be identified, while many of them remain unassigned, even though they are of reasonable quality. The ubiquitous presence of post-translational modifications (PTMs) is one of the reasons for current low spectral identification rate. In order to identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this thesis, we describe a new algorithm for identifying modified peptides without requiring users to specify the possible modifications before the search routine; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. A software tool, PeaksPTM, has been developed and it has already achieved a stronger performance than competitive tools for unrestricted identification of post-translationally modified peptides. Another important reason for the failure of the search tools is the inaccurate mass or charge state measurement of the precursor peptide ion. In this thesis, we study the precursor mono-isotopic mass and charge determination problem, and propose an algorithm to correct precursor ion mass error by assessing the isotopic features in its parent MS spectrum. The algorithm has been tested on two annotated data sets and achieved almost 100 percent accuracy. Furthermore, we have studied a more complicated problem, the MS/MS preprocessing problem, and propose a spectrum deconvolution algorithm. Experiments were provided to compare its performance with other existing software. Bioinformatics Mass Spectrometry peptide identification post-translational modification data pre-processing Proteomics Computer Science
5	Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry He, Lin January 2013 (has links) The emergence of tandem mass spectrometry (MS/MS) technology has significantly accelerated protein identification and quantification in proteomics. It enables high-throughput analysis of proteins and their quantities in a complex protein mixture. A mass spectrometer can easily and rapidly generate large volumes of mass spectral data for a biological sample. This bulk of data makes manual interpretation impossible and has also brought numerous challenges in automated data analysis. Algorithmic solutions have been proposed and provide indispensable analytical support in current proteomic experiments. However, new algorithms are still needed to either improve result accuracy or provide additional data analysis capabilities for both protein identification and quantification. Accurate identification of proteins in a sample is the preliminary requirement of a proteomic study. In many cases, a mass spectrum cannot provide complete information to identify the peptide without ambiguity because of the inefficiency of the peptide fragmentation technique and the prevalent existence of noise. We propose ADEPTS to this problem using the complementary information provided in different types of mass spectra. Meanwhile, the occurrence of posttranslational modifications (PTMs) on proteins is another major issue that prevents the interpretation of a large portion of spectra. Using current software tools, users have to specify possible PTMs in advance. However, the number of possible PTMs has to be limited since specifying more PTMs to the software leads to a longer running time and lower result accuracy. Thus, we develop DeNovoPTM and PeaksPTM to provide efficient and accurate solutions. Glycosylation is one of the most frequently observed PTMs in proteomics. It plays important roles in many disease processes and thus has attracted growing research interest. However, lack of algorithms that can identify intact glycopeptides has become the major obstacle that hinders glycoprotein studies. We propose a novel algorithm, GlycoMaster DB, to fulfil this urgent requirement. Additional research is presented on protein quantification, which studies the changes of protein quantity by comparing two or more mass spectral datasets. A crucial problem in the quantification is to correct the retention time distortions between different datasets. Heuristic solutions from previous research have been used in practice but none of them has yet claimed a clear optimization goal. To address this issue, we propose a combinatorial model and practical algorithms for this problem. Bioinformatics Computational biology Mass spectrometry Protein/peptide identification Protein quantification Posttranslational modification Glycosylation Computer Science
6	Eﬀective Strategies for Improving Peptide Identiﬁcation with Tandem Mass Spectrometry Han, Xi January 2011 (has links) Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from protein mixtures in the field of proteomics. However, only about 30% to 40% of current MS/MS spectra can be identified, while many of them remain unassigned, even though they are of reasonable quality. The ubiquitous presence of post-translational modifications (PTMs) is one of the reasons for current low spectral identification rate. In order to identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this thesis, we describe a new algorithm for identifying modified peptides without requiring users to specify the possible modifications before the search routine; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. A software tool, PeaksPTM, has been developed and it has already achieved a stronger performance than competitive tools for unrestricted identification of post-translationally modified peptides. Another important reason for the failure of the search tools is the inaccurate mass or charge state measurement of the precursor peptide ion. In this thesis, we study the precursor mono-isotopic mass and charge determination problem, and propose an algorithm to correct precursor ion mass error by assessing the isotopic features in its parent MS spectrum. The algorithm has been tested on two annotated data sets and achieved almost 100 percent accuracy. Furthermore, we have studied a more complicated problem, the MS/MS preprocessing problem, and propose a spectrum deconvolution algorithm. Experiments were provided to compare its performance with other existing software. Bioinformatics Mass Spectrometry peptide identification post-translational modification data pre-processing Proteomics Computer Science
7	A Comparison of Standard Denoising Methods for Peptide Identification Carpenter, Skylar 01 May 2019 (has links) Peptide identification using tandem mass spectrometry depends on matching the observed spectrum with the theoretical spectrum. The raw data from tandem mass spectrometry, however, is often not optimal because it may contain noise or measurement errors. Denoising this data can improve alignment between observed and theoretical spectra and reduce the number of peaks. The method used by Lewis et. al (2018) uses a combined constant and moving threshold to denoise spectra. We compare the effects of using the standard preprocessing methods baseline removal, wavelet smoothing, and binning on spectra with Lewis et. al’s threshold method. We consider individual methods and combinations, using measures of distance from Lewis et. al's scoring function for comparison. Our findings showed that no single method provided better results than Lewis et. al's, but combining techniques with that of Lewis et. al's reduced the distance measurements and size of the data set for many peptides. Peptide identification denoising tandem mass spectrometry baseline wavelet binning Applied Statistics
8	Peptide Refinement by Using a Stochastic Search Lewis, Nicole H., Hitchcock, David B., Dryden, Ian L., Rose, John R. 01 November 2018 (has links) Identifying a peptide on the basis of a scan from a mass spectrometer is an important yet highly challenging problem. To identify peptides, we present a Bayesian approach which uses prior information about the average relative abundances of bond cleavages and the prior probability of any particular amino acid sequence. The scoring function proposed is composed of two overall distance measures, which measure how close an observed spectrum is to a theoretical scan for a peptide. Our use of our scoring function, which approximates a likelihood, has connections to the generalization presented by Bissiri and co-workers of the Bayesian framework. A Markov chain Monte Carlo algorithm is employed to simulate candidate choices from the posterior distribution of the peptide sequence. The true peptide is estimated as the peptide with the largest posterior density. Bayesian methods Markov chain Monte Carlo sampling peptide identification stochastic search tandem mass spectrometry Mathematics and Statistics
9	PARALLEL COMPUTING ALGORITHMS FOR TANDEM 2013 April 1900 (has links) Tandem mass spectrometry, also known as MS/MS, is an analytical technique to measure the mass-to-charge ratio of charged ions and widely used in genomics, proteomics and metabolomics areas. There are two types of automatic ways to interpret tandem mass spectra: de novo methods and database searching methods. Both of them need to use massive computational resources and complicated comparison algorithms. The real-time peptide-spectrum matching (RT-PSM) algorithm is a database searching method to interpret tandem mass spectra with strict time constraints. Restricted by the hardware and architecture of an individual workstation the RT-PSM algorithm has to sacrifice the level of accuracy in order to provide prerequisite processing speed. The peptide-spectrum similarity scoring module is the most time-consuming part out of four modules in the RT-PSM algorithm, which is also the core of the algorithm. In this study, a multi-core computing algorithm is developed for individual workstations. Moreover, a distributed computing algorithm is designed for a cluster. The improved algorithms can achieve the speed requirement of RT-PSM without sacrificing the accuracy. With some expansion, this distributed computing algorithm can also support different PSM algorithms. Simulation results show that compared with the original RT-PSM, the parallelization version achieves 25 to 34 times speed-up based on different individual workstations. A cluster with 240 CPU cores could accelerate the similarity score module 210 times compare with the single-thread similarity score module and the whole peptide identification process 85 times compare with the single-thread peptide identification process. tandem mass spectrum parallel computing algorithm multi-core computing algorithm distributed computing algorithm peptide identification
10	Podobnostní vyhledávání v databázích hmotnostních spekter / Similarity search in Mass Spectra Databases Novák, Jiří January 2013 (has links) Shotgun proteomics is a widely known technique for identification of protein and peptide sequences from an "in vitro" sample. A tandem mass spectrometer generates tens of thousands of mass spectra which must be annotated with peptide sequences. For this purpose, the similarity search in a database of theoretical spectra generated from a database of known protein sequences can be utilized. Since the sizes of databases grow rapidly in recent years, there is a demand for utilization of various database indexing techniques. We investigate the capabilities of (non)metric access methods as the database indexing techniques for fast and approximate similarity retrieval in mass spectra databases. We show that the method for peptide sequences identification is more than 100x faster than a sequential scan over the entire database while more than 90% of spectra are correctly annotated with peptide sequences. Since the method is currently suitable for small mixtures of proteins, we also utilize a precursor mass filter as the database indexing technique for complex mixtures of proteins. The precursor mass filter followed by ranking of spectra by a modification of the parametrized Hausdorff distance outperforms state-of-the-art tools in the number of identified peptide sequences and the speed of search. The...

Search results