Return to search

Tumor subclone structure reconstruction with genomic variation data

Thesis advisor: Gabor Marth / Unlike normal tissue cells, which contain identical copies of the same genome, tumors are composed of genetically divergent cell subpopulations, or subclones. The abilities to identify the number of subclones, their frequencies within the tumor mass, and the evolutionary relationships among them are crucial in understanding the basis of tumorigenesis, drug response, relapse, and metastasis. It is also essential information for informed, personalized therapeutic decisions. Studies have attempted to reconstruct subclone structure by identifying distinct allele frequency distribution modes at a handful of somatic single nucleotide variant loci, but this question was not adequately addressed with computational means at the start of this dissertation work, and recent efforts either enforce certain assumptions or resort to statistical procedure which cannot guarantee the complete landscape of solution space. This dissertation present a computational framework that examines somatic variation events, such as copy number changes, loss of heterozygosity, or point mutations, in order to identify the underlying subclone structure. Chapter 2 discuss the presence of intra-tumoral heterogeneity, and for historical interest, a method to reconstruct the parsimonious solution based on simplifying assumptions in tumor micro-evolution process. Analysis results on clinical datasets concerning Ovarian Serious Carcinoma and Intracranal Germ Cell Tumor based on this method, which confirmed the genomic complexity, are also presented. Due to the reason that the linkage information i.e. whether two mutations are co-localizing in the same cancer cell is lost during tissue homogenization and DNA fragmentation, common sample preparation steps used in whole genome profiling techniques, often there are more than one subclone model capable of explaining the observation. Chapter 3 describes an extended method that is able to search for all models consistent with the observation. Consequently, the solution to a specific input dataset is then a set of possible subclone structures. The method then trim this solution space in the case that more than one sample from the same patient are available, such as the primary and relapse tumor pairs. Furthermore, a statistical framework is developed that, when further trimming is not possible, predicts whether two mutations are co-localizing in the same subclone. The formal definition on the problem of subclone structure reconstruction, as well as techniques to pre-process various types of genomic variation data are given given here as well. Results on the analysis of published and novel datasets, ranging from cancer types including Acute Myeloid Leukemia, Sinonasal Undifferenciated Carcinoma and Ovarian Serious Carcinoma, and data types including whole genome sequencing, copy number array, single nucleotide polymorphism array and single nucleotide variant calls with deep sequencing are also included. They show that the method is applicable to these wide range of cancer and data types, able to independently replicate the published conclusion based on manual reasoning, and gain novel insights into the pattern of tumor recurrence and chemoresistance. It also shows that the method can be valuable in prioritizing variants for function study. / Thesis (PhD) — Boston College, 2014. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology.

Identiferoai:union.ndltd.org:BOSTON/oai:dlib.bc.edu:bc-ir_104182
Date January 2014
CreatorsQiao, Yi
PublisherBoston College
Source SetsBoston College
LanguageEnglish
Detected LanguageEnglish
TypeText, thesis
Formatelectronic, application/pdf
RightsCopyright is held by the author, with all rights reserved, unless otherwise noted.

Page generated in 0.0025 seconds