The discovery of proteomic information through the use of mass spectrometry (MS) has been an active area of research in the diagnosis and prognosis of many types of cancer. This process involves feature selection through peak detection but is often complicated by many forms of non-biologicalbias. The need to extract biologically relevant peak information from MS data has resulted in the development of statistical techniques to aid in spectra pre-processing. Baseline estimation and normalization are important pre-processing steps because the subsequent quantification of peak heights depends on this baseline estimate. This dissertation introduces a mixture model to estimate the baseline and peak heights simultaneously through the expectation-maximization (EM) algorithm and a penalized likelihood approach. Our model-based pre-processing performs well in the presence of raw, unnormalized data, with few subjective inputs. We also propose a model-based normalization solution for use in subsequent classification procedures, where misclassification results compare favorably with existing methods of normalization. The performance of our pre-processing method is evaluated using popular matrix-assisted laser desorption and ionization (MALDI) and surface-enhanced laser desorption and ionization (SELDI) datasets as well as through simulation.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2009-12-7230 |
Date | 2009 December 1900 |
Creators | Wagaman, John C. |
Contributors | Huang, Jianhua, West, Webster |
Source Sets | Texas A and M University |
Language | en_US |
Detected Language | English |
Type | Book, Thesis, Electronic Dissertation, text |
Format | application/pdf |
Page generated in 0.0015 seconds