Return to search

Mining patterns in genomic and clinical cancer data to characterize novel driver genes

Cancer research, like many areas of science, is adapting to a new era characterized by increasing quantity, quality, and diversity of observational data. An example of the advances, and the resulting challenges, is represented by The Cancer Genome Atlas, an enormous public effort that has provided genomic profiles of hundreds of tumors of each of the most common solid cancer types. Alongside this resource is a host of other data and knowledge, including gene interaction databases, Mendelian disease causal variants, and electronic health records spanning many millions of patients. Thus, a current challenge is how best to integrate these data to discover mechanisms of oncogenesis and cancer progression. Ultimately, this could enable genomics-based prediction of an individual patient's outcome and targeted therapies, a goal termed precision medicine. In this thesis, I develop novel approaches that examine patterns in populations of cancer patients to identify key genetic changes and suggest likely roles of these driver genes in the diseases.
In the first section I show how genomics can lead to the identification of driver alterations in melanoma. The most recurrent genetic mutations are often in important cancer driver genes: in a newly sequenced melanoma cohort, recurrent inactivating mutations point to an exciting new melanoma candidate tumor suppressor, FBXW7, with therapeutic implications.
But each tumor is unique, underlining the fact that recurrence will never capture all relevant mutations responsible for the disease. Tumors are a result of random events that must collaborate to endow a cell with all of the invasive and immortal properties of a cancer. Some combinations of events are lethal to a developing tumor, while other combinations are simply not preferentially selected. In order to discover these complex patterns, I develop a method based on the joint entropy of a set of genes, called GAMToC. Using GAMToC, I identify sets of recurrently altered genes with a strongly non-random joint pattern of co-occurrence and mutual exclusivity. Then, I extend this method as a means of identifying novel genes with a role in cancer, by virtue of their non-random pattern of alteration. Insights into the roles of these novel drivers can come from their most strongly co-selected partners.
In the final section of the main text, I develop the use of cancer comorbidity, or increased cancer risk, as a novel data source for understanding cancer. The recent availability of clinical records spanning a large percentage of the American population has enabled discovery of many cancer comorbidities. Although most cancers arise as a result of somatic mutations accumulating over a patient's lifespan, mutations present at birth could predispose some rare populations to increased cancer risk. Mendelian disease phenotype provides strong insight into the genotype of an afflicted individual. Thus, if Mendelian diseases with cancer comorbidity can be shown to have specific defects in processes that are important in the development of that cancer, statistical comorbidity could provide a new a resource for prioritizing Mendelian disease genes as novel cancer related genes. For this purpose, I integrate clinical comorbidity, Mendelian disease causal variants, and somatic genomic profiles of thousands of cancers. I demonstrate that comorbidity indeed is associated with significant genetic similarity between Mendelian diseases and the cancers these patients are predisposed to, suggesting highly interesting and plausible new candidate cancer genes. While cancer may be the result of a series of selected random events, patterns of incidence across large populations, as measured by genomics or by other phenotypes, contain much non-random signal yet to be mined.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8KP8130
Date January 2015
CreatorsMelamed, Rachel D.
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0024 seconds