Cancer is a fearful, deadly disease. Currently there is almost no cure. The reason is that the disease mechanisms are hardly understood to humans. This in turn is because of the complex molecular activities that underlie cancer processes. Some variables of these processes, such as gene expressions, copy number profiles and point mutations, recently became measurable in high-throughput. However, these data are massive and non-readable even to experts. A lot of efforts are being made to develop engineering tools for the analysis and interpretation of these data, for various purposes.
In this thesis, we focus on addressing the problem of individuality in cancer. More specifically, we are interested in knowing the subgroups of processes in a cancer, called subtypes. This problem has both theoretical and practical implications. Theoretically, classification of cancer patients represents an understanding of the disease, and may help speed up drug development. Practically, subgroups of patients can be treated with different protocols for optimal outcomes. Towards this end, we propose an approach with two specific aims: performing subtypes for a given set of high-throughput data, and identifying candidate genes (called drivers) that drive the subtype-specific processes.
First, we assume that a subtype has a distinctive process, compared not just with normal controls, but also with other cases of the same cancer. The process is characterized with a set of differentially expressed genes uniquely found in the corresponding subtype. Based on this assumption, we develop a signature based subtyping algorithm, which on the one hand divides a set of cases into as many subtypes as possible, while on the other hand merges subtypes that have too small a signature set. We applied this algorithm to datasets of the pediatric brain tumor of medulloblastoma, and found no more than three subtypes can meet the above criteria.
Second, we explore subtype patterns of the copy number profiles. By regarding all events on a chromosome arm as a single event, we quantize the copy number profiles into event profiles. An unsupervised decision tree training algorithm is specifically designed for detecting subtypes on these profiles. The trained decision tree is intuitive, predictive, easy to implement and deterministic. Its application to datasets of medulloblastoma reveals interesting subtype patterns characterized with co-occurrence of CNA events. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
Identifer | oai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/180954 |
Date | January 2012 |
Creators | Chen, Peikai., 陈培凯. |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Source Sets | Hong Kong University Theses |
Language | English |
Detected Language | English |
Type | PG_Thesis |
Source | http://hub.hku.hk/bib/B49617746 |
Rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License |
Relation | HKU Theses Online (HKUTO) |
Page generated in 0.0027 seconds