This dissertation presents a novel data mining algorithm identifying molecular signatures, called attractor metagenes, from large biological data sets. It also presents a computational model for combining such signatures to create prognostic biomarkers. Using the algorithm on multiple cancer data sets, we identified three such gene co-expression signatures that are present in nearly identical form in different tumor types representing biomolecular events in cancer, namely mitotic chromosomal instability, mesenchymal transition, and lymphocyte infiltration. A comprehensive experimental investigation using mouse xenograft models on the mesenchymal transition attractor metagene showed that the signature was expressed in the human cancer cells, but not in the mouse stroma. The attractor metagenes were used to build the winning model of a breast cancer prognosis challenge. When applied on larger data sets from 12 different cancer types from The Cancer Genome Atlas Pan-Cancer project, the algorithm identified additional pan-cancer molecular signatures, some of which involve methylation sites, microRNA expression, and protein activity.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8NP22JK |
Date | January 2013 |
Creators | Cheng, Wei-Yi |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0078 seconds