1 |
classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS WorkflowsKey, Melissa Chester 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Because label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS)
shotgun proteomics infers the peptide sequence of each measurement, there is inherent
uncertainty in the identity of each peptide and its originating protein. Removing
misidentified peptides can improve the accuracy and power of downstream analyses
when differences between proteins are of primary interest.
In this dissertation I present classCleaner, a novel algorithm designed to identify
misidentified peptides from each protein using the available quantitative data. The
algorithm is based on the idea that distances between peptides belonging to the same
protein are stochastically smaller than those between peptides in different proteins.
The method first determines a threshold based on the estimated distribution of these
two groups of distances. This is used to create a decision rule for each peptide based
on counting the number of within-protein distances smaller than the threshold.
Using simulated data, I show that classCleaner always reduces the proportion
of misidentified peptides, with better results for larger proteins (by number of constituent
peptides), smaller inherent misidentification rates, and larger sample sizes.
ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional
Voting Records data set from the UCI machine learning repository. The later
is used to demonstrate that the algorithm is not specific to proteomics.
|
Page generated in 0.0642 seconds