Return to search

New Approaches of Differential Gene Expression Analysis and Cancer Immune Evasion Mechanism Identification

Background: Genomic and epigenomic data analyses has been a popular research area in the 21st century. Common research problems include detecting differentially expressed genes between groups, clustering and classification using genomic data in order to study the heterogeneity of a disease, and dividing a sequence of measurements along a genome into segments to identify different functional regions of the genome. This study gives a comprehensive investigation of the aforementioned tasks, with emphasis on developing new computational methodologies. Normalization is an important data preparation step in gene expression analyses, in order to remove various systematic noise, therefore reduce sample variance and increase the power of subsequent statistical analyses. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce bias to the data. A question of interest is how to avoid inflation of type I error rate and loss of statistical power incurred by this bias. Breast cancer (BRCA) can escape immune surveillance using 6 known evasion mechanisms, yet the complexity of combination of these mechanisms used by subsets of human BRCA patients is not fully understood. In the era of immunotherapy and personalized medication, there is an urgent need for advancing the knowledge of immune evasion clusters (IEC) in BRCA and identifying reliable biomarkers, which is essential for better understanding of patients’ response to immunotherapies and for rational clinical trial design of combination immunotherapies. Identification of functional enriched regions of a genome often requires dividing a sequence of measurements along the genome into segments where adjacent segments have different properties (e.g. mean values). Despite dozens of algorithms developed to address this issue, accuracy and computational efficiency still need to be improved, to tackle both existing and emerging segmentation problems in genomic and epigenomic research. Results: In chapter 1 of this study we propose a new differential gene expression analysis pipeline super-delta, that pairs a modified t-test derived based on large sample theory with a robust multivariate extension of global normalization, designed to minimize the bias introduced by DEGs. In simulation studies, Super-delta was compared to four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization, and shown to have better statistical power with tighter type I error control. We then applied all methods to a microarray gene expression dataset on BRCA patients who received neoadjuvant chemotherapy. Super-delta was able to identify marginally more DEGs than its competitors, in addition to the substantial overlap of DEGs identified by all of them. Appropriate adaptations are under active development to make this procedure framework incorporated with RNA-Seq data and more general between-group comparison problems. In chapter 2, we developed a sequential biclustering (SBiC) method based on existing biclustering approach using the plaid model and applied it to the log2 normalized RNA-seq data of immune related genes of BRCA patients from The Cancer Genome Atlas (TCGA). We identified seven clusters for 81% of the studied samples. We found that 78.8% of these samples evade through TGF-β immunosuppression, 57.75% through DcR3 counterattack, 48% through CTLA4, and 27.8% through PD-1. Interestingly, combination of TGF-β and DcR3 was pronounced in 57.75% of patients and evasion through DcR3 was exclusive to the lobular invasive subgroup. In addition, triple negative breast cancer (TNBC) patients split equally into 2 clusters: one with impaired antigen presentation and another with high leukocyte recruitment but a combination of 4 evasion mechanisms. We also identified biomarkers that play important roles in distinguishing immune evasion mechanisms. These findings provide a better understanding of patients’ response to immunotherapies and shed light to rational design of novel combination immunotherapies. In chapter 3, We designed an efficient algorithm called iSeg, for segmentation of genomic and epigenomic profiles. It first utilizes dynamic programming to identify candidate significant segments, then uses a novel data structure based on coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Merging of significant segments are performed at the end to generate the final set of segments. The algorithm can serve as a general computational framework that works with different model assumptions of the data. As a general procedure, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, and (differential) nuclease sensitivity. We evaluated iSeg using both simulated and experimental datasets and showed that it performs satisfactorily when compared with some popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is very computationally efficient, well suited for long sequences and large number of input data profiles. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 11, 2018. / differential gene expression analysis, immune evasion mechanism, robust data normalization, segmentation, sequential biclustering, Super-delta / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Qing-Xiang (Amy) Sang, University Representative; Qing Mai, Committee Member; Yiyuan She, Committee Member.

Identiferoai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_657789
ContributorsLiu, Yuhang (author), Zhang, Jinfeng (professor directing dissertation), Sang, Qing-Xiang (university representative), Mai, Qing (committee member), She, Yiyuan (committee member), Florida State University (degree granting institution), College of Arts and Sciences (degree granting college), Department of Statistics (degree granting departmentdgg)
PublisherFlorida State University
Source SetsFlorida State University
LanguageEnglish, English
Detected LanguageEnglish
TypeText, text, doctoral thesis
Format1 online resource (96 pages), computer, application/pdf

Page generated in 0.0049 seconds