1 |
Topic Model-based Mass Spectrometric Data Analysis in Cancer Biomarker Discovery StudiesWang, Minkun 14 June 2017 (has links)
Identification of disease-related alterations in molecular and cellular mechanisms may reveal useful biomarkers for human diseases including cancers. High-throughput omic technologies for identifying and quantifying multi-level biological molecules (e.g., proteins, glycans, and metabolites) have facilitated the advances in biological research in recent years. Liquid (or gas) chromatography coupled with mass spectrometry (LC/GC-MS) has become an essential tool in such large-scale omic studies. Appropriate LC/GC-MS data preprocessing pipelines are needed to detect true differences between biological groups. Challenges exist in several aspects of MS data analysis. Specifically for biomarker discovery, one fundamental challenge in quantitation of biomolecules is owing to the heterogeneous nature of human biospecimens. Although this issue has been a subject of discussion in cancer genomic studies, it has not yet been rigorously investigated in mass spectrometry based omic studies. Purification of mass spectometric data is highly desired prior to subsequent differential analysis.
In this research dissertation, we majorly target at addressing the purification problem through probabilistic modeling. We propose an intensity-level purification model (IPM) to computationally purify LC/GC-MS based cancerous data in biomarker discovery studies. We further extend IPM to scan-level purification model (SPM) by considering information from extracted ion chromatogram (EIC, scan-level feature). Both IPM and SPM belong to the category of topic modeling approach, which aims to identify the underlying "topics" (sources) and their mixture proportions in composing the heterogeneous data. Additionally, denoise deconvolution model (DMM) is proposed to capture the noise signals in samples based on purified profiles. Variational expectation-maximization (VEM) and Markov chain Monte Carlo (MCMC) methods are used to draw inference on the latent variables and estimate the model parameters. Before we come to purification, other research topics in related to mass spectrometric data analysis for cancer biomarker discovery are also investigated in this dissertation.
Chapter 3 discusses the developed methods in the differential analysis of LC/GC-MS based omic data, specifically for the preprocessing in data of LC-MS profiled glycans. Chapter 4 presents the assumptions and inference details of IPM, SPM, and DDM. A latent Dirichlet allocation (LDA) core is used to model the heterogeneous cancerous data as mixtures of topics consisting of sample-specific pure cancerous source and non-cancerous contaminants. We evaluated the capability of the proposed models in capturing mixture proportions of contaminants and cancer profiles on LC-MS based serum and tissue proteomic and GC-MS based tissue metabolomic datasets acquired from patients with hepatocellular carcinoma (HCC) and liver cirrhosis. Chapter 5 elaborates these applications in cancer biomarker discovery, where typical single omic and integrative analysis of multi-omic studies are included. / Ph. D. / This dissertation documents the methodology and outputs for computational deconvolution of heterogeneous omics data generated from biospecimens of interest. These omics data convey qualitative and quantitative information of biomolecules (e.g., glycans, proteins, metabolites, etc.) which are profiled by instruments named liquid (or gas) chromatography and mass spectrometer (LC/GC-MS). In the scenarios of biomarker discovery, we aim to find out the significant difference on intensities of biomolecules with respect to two specific phenotype groups so that the biomarkers can be used as clinical indicators for early stage diagnose. However, the purity of collected samples constitutes the fundamental challenge to the process of differential analysis. Instead of experimental methods that are costly and time-consuming, we treat the purification task as one of the topic modeling procedures, where we assume each observed biomolecular profile is a mixture of hidden pure source together with unwanted contaminants.
The developed models output the estimated mixture proportion as well as the underlying “topics”. With different level’s purification applied, improved discrimination power of candidate biomarkers and more biologically meaningful pathways were discovered in LC/GC-MS based multi-omic studies for liver cancer. This research work originates from a broader scope of probabilistic generative modeling, where rational assumptions are made to characterize the generation process of the observations. Therefore, the developed models in this dissertation have great potential in applications other than heterogeneous data purification discussed in this dissertation. A good example is to uncover the relationship of human gut microbiome with the host’s phenotypes of interest (e.g., disease like type-II diabetes). Similar challenges exist in how to infer the underlying intestinal flora distribution and estimate their mixture proportions.
This dissertation also covers topics of related data preprocessing and integration, but with a consistent goal in improving the performance of biomarker discovery. In summary, the research help address sample heterogeneity issue observed in LC/GC-MS based cancer biomarker discovery studies and shed light on computational deconvolution of the mixtures, which can be generalized to other domains of interest.
|
2 |
Large Volume Injection and Hyphenated Techniques for Gas Chromatographic Determination of PBDEs and Carbazoles in AirTollbäck, Petter January 2005 (has links)
<p>This thesis is based on studies in which the suitability of various gas chromatography (GC) injection techniques was examined for the determination of polybrominated diphenyl ethers (PBDEs) and carbazoles, two groups of compounds that are thermally labile and/or have high boiling-points. For such substances, it is essential to introduce the samples into the GC system in an appropriate way to avoid degradation and other potential problems. In addition, different types of gas chromatographic column system and mass spectrometric detectors were evaluated for the determination of PBDEs.</p><p>Conventional injectors, such as splitless, on-column and programmed temperature vaporizing (PTV) injectors were evaluated and optimized for determination of PBDEs. The results show on-column injection to be the best option, providing low discrimination and high precision. The splitless injector is commonly used for “dirty” samples. However, it is not suitable for determination of the high molecular weight congeners, since it tends to discriminate against them and promote their degradation, leading to poor precision and accuracy. The PTV injector appears to be a more suitable alternative. The use of liners reduces problems associated with potential interferents such as polar compounds and lipids and compared to the hot splitless injector, it provides gentler solvent evaporation, due to its temperature programming feature, leading to low discrimination and variance.</p><p>Increasing the injection volume from the conventional 1-3 µL to >50 µL offers two main benefits. Firstly, the overall detection and quantification limits are decreased, since the entire sample extract can be injected into the GC system. Secondly, large volume injections enable hyphenation of preceding techniques such as liquid chromatography (LC), solid phase extraction and other kinds of extraction. Large-volume injections were utilized and optimized in the studies included in this thesis.</p><p>With a loop-type injector/interface large sample volumes can be injected on-column providing low risk of discrimination against compounds with low volatility. This injector was used for the determination of PBDEs in air and as an interface for the determination of carbazoles by LC-GC. Peak distortion is a frequently encountered problem associated with this type of injector that was addressed and solved during the work underlying this thesis.</p><p>The PTV can be used as a large volume injector, in so-called solvent vent mode. This technique was evaluated for the determination of PBDEs and as an interface for coupling dynamic sonication-assisted solvent extraction online to GC. The results show that careful optimization of the injection parameters is required, but also that the PTV is robust and yields reproducible results.</p><p>PBDEs are commonly detected using mass spectrometry in electron capture negative ionization (ECNI) mode, monitoring bromine ions (m/z 79 and 81). The mass spectrometric properties of the fully brominated diphenyl ether, BDE-209, have been investigated. A high molecular weight fragment at m/z 486/488 enables the use of 13C-labeled BDE-209 as an internal surrogate standard.</p>
|
3 |
Large Volume Injection and Hyphenated Techniques for Gas Chromatographic Determination of PBDEs and Carbazoles in AirTollbäck, Petter January 2005 (has links)
This thesis is based on studies in which the suitability of various gas chromatography (GC) injection techniques was examined for the determination of polybrominated diphenyl ethers (PBDEs) and carbazoles, two groups of compounds that are thermally labile and/or have high boiling-points. For such substances, it is essential to introduce the samples into the GC system in an appropriate way to avoid degradation and other potential problems. In addition, different types of gas chromatographic column system and mass spectrometric detectors were evaluated for the determination of PBDEs. Conventional injectors, such as splitless, on-column and programmed temperature vaporizing (PTV) injectors were evaluated and optimized for determination of PBDEs. The results show on-column injection to be the best option, providing low discrimination and high precision. The splitless injector is commonly used for “dirty” samples. However, it is not suitable for determination of the high molecular weight congeners, since it tends to discriminate against them and promote their degradation, leading to poor precision and accuracy. The PTV injector appears to be a more suitable alternative. The use of liners reduces problems associated with potential interferents such as polar compounds and lipids and compared to the hot splitless injector, it provides gentler solvent evaporation, due to its temperature programming feature, leading to low discrimination and variance. Increasing the injection volume from the conventional 1-3 µL to >50 µL offers two main benefits. Firstly, the overall detection and quantification limits are decreased, since the entire sample extract can be injected into the GC system. Secondly, large volume injections enable hyphenation of preceding techniques such as liquid chromatography (LC), solid phase extraction and other kinds of extraction. Large-volume injections were utilized and optimized in the studies included in this thesis. With a loop-type injector/interface large sample volumes can be injected on-column providing low risk of discrimination against compounds with low volatility. This injector was used for the determination of PBDEs in air and as an interface for the determination of carbazoles by LC-GC. Peak distortion is a frequently encountered problem associated with this type of injector that was addressed and solved during the work underlying this thesis. The PTV can be used as a large volume injector, in so-called solvent vent mode. This technique was evaluated for the determination of PBDEs and as an interface for coupling dynamic sonication-assisted solvent extraction online to GC. The results show that careful optimization of the injection parameters is required, but also that the PTV is robust and yields reproducible results. PBDEs are commonly detected using mass spectrometry in electron capture negative ionization (ECNI) mode, monitoring bromine ions (m/z 79 and 81). The mass spectrometric properties of the fully brominated diphenyl ether, BDE-209, have been investigated. A high molecular weight fragment at m/z 486/488 enables the use of 13C-labeled BDE-209 as an internal surrogate standard.
|
Page generated in 0.0197 seconds