• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers

Fu, Yi 30 January 2020 (has links)
Rapid advances in high-throughput molecular profiling techniques enabled large-scale genomics, transcriptomics, and proteomics-based biomedical studies, generating an enormous amount of multi-omics data. Processing and summarizing multi-omics data, modeling interactions among biomolecules, and detecting condition-specific dysregulation using multi-omics data are some of the most important yet challenging analytics tasks. In the case of detecting somatic DNA copy number aberrations using bulk tumor samples in cancer research, normal cell contamination becomes one significant confounding factor that weakens the power regardless of whichever methods used for detection. To address this problem, we propose a computational approach – BACOM 2.0 to more accurately estimate normal cell fraction and accordingly reconstruct DNA copy number signals in cancer cells. Specifically, by introducing allele-specific absolute normalization, BACOM 2.0 can accurately detect deletion types and aneuploidy in cancer cells directly from DNA copy number data. Genes work through complex networks to support cellular processes. Dysregulated genes can cause structural changes in biological networks, also known as network rewiring. Genes with a large number of rewired edges are more likely to be associated with functional alteration leading phenotype transitions, and hence are potential biomarkers in diseases such as cancers. Differential dependency network (DDN) method was proposed to detect such network rewiring and biomarkers. However, the existing DDN method and software tool has two major drawbacks. Firstly, in imbalanced sample groups, DDN suffers from systematic bias and produces false positive differential dependencies. Secondly, the computational time of the block coordinate descent algorithm in DDN increases rapidly with the number of involved samples and molecular entities. To address the imbalanced sample group problem, we propose a sample-scale-wide normalized formulation to correct systematic bias and design a simulation study for testing the performance. To address high computational complexity, we propose several strategies to accelerate DDN learning, including two reformulated algorithms for block-wise coefficient updating in the DDN optimization problem. Specifically, one strategy on discarding predictors and one strategy on accelerating parallel computing. More importantly, experimental results show that new DDN learning speed with combined accelerating strategies is hundreds of times faster than that of the original method on medium-sized data. We applied the DDN method on several biomedical datasets of omics data and detected significant phenotype-specific network rewiring. With a random-graph-based detection strategy, we discovered the hub node defined biomarkers that helped to generate or validate several novel scientific hypotheses in collaborative research projects. For example, the hub genes detected by the DDN methods in proteomics data from artery samples are significantly enriched in the citric acid cycle pathway that plays a critical role in the development of atherosclerosis. To detect intra-omics and inter-omics network rewirings, we propose a method called multiDDN that uses a multi-layer signaling model to integrate multi-omics data. We adapt the block coordinate descent algorithm to solve the multiDDN optimization problem with accelerating strategies. The simulation study shows that, compared with the DDN method on single omics, the multiDDN method has considerable advantage on higher accuracy of detecting network rewiring. We applied the multiDDN method on the real multi-omics data from CPTAC ovarian cancer dataset, and detected multiple hub genes associated with histone protein deacetylation and were previously reported in independent ovarian cancer data analysis. / Doctor of Philosophy / We witnessed the start of the human genome project decades ago and stepped into the era of omics since then. Omics are comprehensive approaches for analyzing genome-wide biomolecular profiles. The rapid development of high-throughput technologies enables us to produce an enormous amount of omics data such as genomics, transcriptomics, and proteomics data, which makes researchers swim in a sea of omics information that once never imagined. Yet, the era of omics brings new challenges to us: to process the huge volumes of data, to summarize the data, to reveal the interactions between entities, to link various types of omics data, and to discover mechanisms hidden behind omics data. In processing omics data, one factor that weakens the strengths of follow up data analysis is sample impurity. We call impure tumor samples contaminated by normal cells as heterogeneous samples. The genomic signals measured from heterogeneous samples are a mixture of signals from both tumor cells and normal cells. To correct the mixed signals and get true signals from pure tumor cells, we propose a computational approach called BACOM 2.0 to estimate normal cell fraction and corrected genomics signals accordingly. By introducing a novel normalization method that identifies the neutral component in mixed signals of genomic copy number data, BACOM 2.0 could accurately detect genes' deletion types and abnormal chromosome numbers in tumor cells. In cells, genes connect to other genes and form complex biological networks to perform their functions. Dysregulated genes can cause structural change in biological networks, also known as network rewiring. In a biological network with network rewiring events, a large quantity of network rewiring linking to a single hub gene suggests concentrated gene dysregulation. This hub gene has more impact on the network and hence is more likely to associate with the functional change of the network, which ultimately leads to abnormal phenotypes such as cancer diseases. Therefore, the hub genes linked with network rewiring are potential indicators of disease status or known as biomarkers. Differential dependency network (DDN) method was proposed to detect network rewiring events and biomarkers from omics data. However, the DDN method still has a few drawbacks. Firstly, for two groups of data with unequal sample sizes, DDN consistently detects false targets of network rewiring. The permutation test, which uses the same method on randomly shuffled samples is supposed to distinguish the true targets from random effects, however, is also suffered from the same reason and could let pass those false targets. We propose a new formulation that corrects the mistakes brought by unequal group size and design a simulation study to test the new formulation's correctness. Secondly, the time used for computing in solving DDN problems is unbearably long when processing omics data with a large number of samples scale or a large number of genes. We propose several strategies to increase DDN's computation speed, including three redesigned formulas for efficiently updating the results, one rule to preselect predictor variables, and one accelerating skill of utilizing multiple CPU cores simultaneously. In the timing test, the DDN method with increased computing speed is much faster than the original method. To detect network rewirings within the same omics data or between different types of omics, we propose a method called multiDDN that uses an integrated model to process multiple types of omics data. We solve the new problem by adapting the block coordinate descending algorithm. The test on simulated data shows multiDDN is better than single omics DDN. We applied DDN or multiDDN method on several datasets of omics data and detected significant network rewiring associated with diseases. We detected hub nodes from the network rewiring events. These hub genes as potential biomarkers help us to ask new meaningful questions in related researches.
2

Multi-Platform Molecular Data Integration and Disease Outcome Analysis

Youssef, Ibrahim Mohamed 06 December 2016 (has links)
One of the most common measures of clinical outcomes is the survival time. Accurately linking cancer molecular profiling with survival outcome advances clinical management of cancer. However, existing survival analysis relies intensively on statistical evidence from a single level of data, without paying much attention to the integration of interacting multi-level data and the underlying biology. Advances in genomic techniques provide unprecedented power of characterizing the cancer tissue in a more complete manner than before, opening the opportunity of designing biologically informed and integrative approaches for survival analysis. Many cancer tissues have been profiled for gene expression levels and genomic variants (such as copy number alterations, sequence mutations, DNA methylation, and histone modification). However, it is not clear how to integrate the gene expression and genetic variants to achieve a better prediction and understanding of the cancer survival. To address this challenge, we propose two approaches for data integration in order to both biologically and statistically boost the features selection process for proper detection of the true predictive players of survival. The first approach is data-driven yet biologically informed. Consistent with the biological hierarchy from DNA to RNA, we prioritize each survival-relevant feature with two separate scores, predictive and mechanistic. With mRNA expression levels in concern, predictive features are those mRNAs whose variation in expression levels are associated with the survival outcome, and mechanistic features are those mRNAs whose variation in expression levels are associated with genomic variants (copy number alterations (CNAs) in this study). Further, we propose simultaneously integrating information from both the predictive model and the mechanistic model through our new approach GEMPS (Gene Expression as a Mediator for Predicting Survival). Applied on two cancer types (ovarian and glioblastoma multiforme), our method achieved better prediction power than peer methods. Gene set enrichment analysis confirms that the genes utilized for the final survival analysis are biologically important and relevant. The second approach is a generic mathematical framework to biologically regularize the Cox's proportional hazards model that is widely used in survival analysis. We propose a penalty function that both links the mechanistic model to the clinical model and reflects the biological downstream regulatory effect of the genomic variants on the mRNA expression levels of the target genes. Fast and efficient optimization principles like the coordinate descent and majorization-minimization are adopted in the inference process of the coefficients of the Cox model predictors. Through this model, we develop the regulator-target gene relationship to a new one: regulator-target-outcome relationship of a disease. Assessed via a simulation study and analysis of two real cancer data sets, the proposed method showed better performance in terms of selecting the true predictors and achieving better survival prediction. The proposed method gives insightful and meaningful interpretability to the selected model due to the biological linking of the mechanistic model and the clinical model. Other important forms of clinical outcomes are monitoring angiogenesis (formation of new blood vessels necessary for tumor to nourish itself and sustain its existence) and assessing therapeutic response. This can be done through dynamic imaging, in which a series of images at different time instances are acquired for a specific tumor site after injection of a contrast agent. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive tool to examine tumor vasculature patterns based on accumulation and washout of the contrast agent. DCE-MRI gives indication about tumor vasculature permeability, which in turn indicates the tumor angiogenic activity. Observing this activity over time can reflect the tumor drug responsiveness and efficacy of the treatment plan. However, due to the limited resolution of the imaging scanners, a partial-volume effect (PVE) problem occurs, which is the result of signals from two or more tissues combining together to produce a single image concentration value within a pixel, with the effect of inaccurate estimation to the values of the pharmacokinetic parameters. A multi-tissue compartmental modeling (CM) technique supported by convex analysis of mixtures is used to mitigate the PVE by clustering pixels and constructing a simplex whose vertices are of a single compartment type. CAM uses the identified pure-volume pixels to estimate the kinetics of the tissues under investigation. We propose an enhanced version of CAM-CM to identify pure-volume pixels more accurately. This includes the consideration of the neighborhood effect on each pixel and the use of a barycentric coordinate system to identify more pure-volume pixels and to test those identified by CAM-CM. Tested on simulated DCE-MRI data, the enhanced CAM-CM achieved better performance in terms of accuracy and reproducibility. / Ph. D.

Page generated in 0.3238 seconds