Return to search

Multi-Platform Molecular Data Integration and Disease Outcome Analysis

One of the most common measures of clinical outcomes is the survival time. Accurately linking cancer molecular profiling with survival outcome advances clinical management of cancer. However, existing survival analysis relies intensively on statistical evidence from a single level of data, without paying much attention to the integration of interacting multi-level data and the underlying biology. Advances in genomic techniques provide unprecedented power of characterizing the cancer tissue in a more complete manner than before, opening the opportunity of designing biologically informed and integrative approaches for survival analysis. Many cancer tissues have been profiled for gene expression levels and genomic variants (such as copy number alterations, sequence mutations, DNA methylation, and histone modification). However, it is not clear how to integrate the gene expression and genetic variants to achieve a better prediction and understanding of the cancer survival.

To address this challenge, we propose two approaches for data integration in order to both biologically and statistically boost the features selection process for proper detection of the true predictive players of survival. The first approach is data-driven yet biologically informed. Consistent with the biological hierarchy from DNA to RNA, we prioritize each survival-relevant feature with two separate scores, predictive and mechanistic. With mRNA expression levels in concern, predictive features are those mRNAs whose variation in expression levels are associated with the survival outcome, and mechanistic features are those mRNAs whose variation in expression levels are associated with genomic variants (copy number alterations (CNAs) in this study). Further, we propose simultaneously integrating information from both the predictive model and the mechanistic model through our new approach GEMPS (Gene Expression as a Mediator for Predicting Survival). Applied on two cancer types (ovarian and glioblastoma multiforme), our method achieved better prediction power than peer methods. Gene set enrichment analysis confirms that the genes utilized for the final survival analysis are biologically important and relevant.

The second approach is a generic mathematical framework to biologically regularize the Cox's proportional hazards model that is widely used in survival analysis. We propose a penalty function that both links the mechanistic model to the clinical model and reflects the biological downstream regulatory effect of the genomic variants on the mRNA expression levels of the target genes. Fast and efficient optimization principles like the coordinate descent and majorization-minimization are adopted in the inference process of the coefficients of the Cox model predictors. Through this model, we develop the regulator-target gene relationship to a new one: regulator-target-outcome relationship of a disease. Assessed via a simulation study and analysis of two real cancer data sets, the proposed method showed better performance in terms of selecting the true predictors and achieving better survival prediction. The proposed method gives insightful and meaningful interpretability to the selected model due to the biological linking of the mechanistic model and the clinical model.

Other important forms of clinical outcomes are monitoring angiogenesis (formation of new blood vessels necessary for tumor to nourish itself and sustain its existence) and assessing therapeutic response. This can be done through dynamic imaging, in which a series of images at different time instances are acquired for a specific tumor site after injection of a contrast agent. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive tool to examine tumor vasculature patterns based on accumulation and washout of the contrast agent. DCE-MRI gives indication about tumor vasculature permeability, which in turn indicates the tumor angiogenic activity. Observing this activity over time can reflect the tumor drug responsiveness and efficacy of the treatment plan. However, due to the limited resolution of the imaging scanners, a partial-volume effect (PVE) problem occurs, which is the result of signals from two or more tissues combining together to produce a single image concentration value within a pixel, with the effect of inaccurate estimation to the values of the pharmacokinetic parameters. A multi-tissue compartmental modeling (CM) technique supported by convex analysis of mixtures is used to mitigate the PVE by clustering pixels and constructing a simplex whose vertices are of a single compartment type. CAM uses the identified pure-volume pixels to estimate the kinetics of the tissues under investigation. We propose an enhanced version of CAM-CM to identify pure-volume pixels more accurately. This includes the consideration of the neighborhood effect on each pixel and the use of a barycentric coordinate system to identify more pure-volume pixels and to test those identified by CAM-CM. Tested on simulated DCE-MRI data, the enhanced CAM-CM achieved better performance in terms of accuracy and reproducibility. / Ph. D. / Disease outcome can refer to an event, state, condition, or behavior for some aspect of a patient’s health status. Event can express survival, while behavior can assess drug efficacy and treatment responsiveness. To gain deeper and, hence, better understanding about diseases, symptoms inspection has been shifted from the physical symptoms appearing externally on the human body to internal symptoms that require invasive and noninvasive techniques to find out and quantify them. These internal symptoms can be further divided into phenotypic and genotypic symptoms. Examples of phenotypes can include shape, structure, and volume of a specific human body organ or tissue. Examples of genotypes can be the dosage of the genetic information and the activity of genes, where genes are responsible for identifying the function of the cells constituting tissues.

Linking disease phenotypes and genotypes to disease outcomes is of great importance to widen the understanding of disease mechanisms and progression. In this dissertation, we propose novel computational techniques to integrate data generated from different platforms, where each data type addresses one aspect of the disease internal symptoms, to provide wider picture and deeper understanding about a disease. We use imaging and genomic data with applications in ovarian, glioblastoma multiforme, and breast cancers to test the proposed techniques. These techniques aim to provide outcomes that are statistically significant, as what current peer methods do, beside biological insights, which current peer methods lack.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/73580
Date06 December 2016
CreatorsYoussef, Ibrahim Mohamed
ContributorsElectrical and Computer Engineering, Yu, Guoqiang, Wang, Yue J., Ressom, Habtom W., Lu, Chang-Tien, Clancy, Thomas Charles III
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0031 seconds