Global ETD Search

Return to search

A systems approach to computational protein identification

Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the
proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments.

This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples.

These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is
introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable.

This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability. / text

http://hdl.handle.net/2152/ETD-UT-2010-05-1036

Computational biology

Bioinformatics

Integrative statistical data analysis

Computational proteomics

Systems biology

Database indexing

Identifer	oai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2010-05-1036
Date	21 October 2010
Creators	Ramakrishnan, Smriti Rajan
Source Sets	University of Texas
Language	English
Detected Language	English
Type	thesis
Format	application/pdf

Page generated in 0.002 seconds

A systems approach to computational protein identification

Description

Links & Downloads

Tags

Additional Fields