Glioblastoma multiforme (GMB) is an extremely aggressive and invasive brain cancer with a median survival of less than one year. In addition, due to its anaplastic nature the histological classification of this cancer is not simple. These characteristics make this disease an interesting and important target for new methodologies of analysis and classification.
In recent years, molecular information has been used to segregate and analyze GBM patients, but generally this methodology utilizes single-`omic' data to perform the classification or multi-’omic’ data in a sequential manner.
In this project, a novel approach for the classification and analysis of patients with GBM is presented. The main objective of this work is to find clusters of patients with distinctive profiles using multi-’omic’ data with a real integrative methodology.
During the last years, the TCGA consortium has made publicly available thousands of multi-’omic’ samples for multiple cancer types. Thanks to this, it was possible to obtain numerous GBM samples (> 300) with data for gene and microRNA expression, CpG sites methylation and copy-number variation (CNV).
To achieve our objective, a mixture of linear models were built for each gene using its expression as output and a mixture of multi-`omic' data as covariates. Each model was coupled with a lasso penalization scheme, and thanks to the mixture nature of the model, it was possible to fit multiple submodels to discover different linear relationships in the same model.
This complex but interpretable method was used to train over \numprint{10000} models. For \texttildelow \numprint{2400} cases, two or more submodels were obtained.
Using the models and their submodels, 6 different clusters of patients were discovered. The clusters were profiled based on clinical information and gene mutations. Through this analysis, a clear separation between the younger patients and with higher survival rate (Clusters 1, 2 and 3) and those from older patients and lower survival rate (Clusters 4, 5 and 6) was found. Mutations in the gene IDH1 were found almost exclusively in Cluster 2, additionally, Cluster 5 presented a hypermutated profile. Finally, several genes not previously related to GBM showed a significant presence in the clusters, such as C15orf2 and CHEK2.
The most significant models for each clusters were studied, with a special focus on their covariants. It was discovered that the number of shared significant models were very small and that the well known GBM related genes appeared as significant covariates for plenty of models, such as EGFR1 and TP53. Along with them, ubiquitin-related genes (UBC and UBD) and NRF1, which have not been linked to GBM previously, had a very significant role.
This work showed the potential of using a mixture of linear models to integrate multi-’omic’ data and to group patients in order to profile them and find novel markers. The resulting clusters showed unique profiles and their significant models and covariates were comprised by well known GBM related genes and novel markers, which present the possibility for new approaches to study and attack this disease. The next step of the project is to improve several elements of the methodology to achieve a more detail analysis of the models and covariates, in particular taking into account the regression coefficients of the submodels.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:32248 |
Date | 26 November 2018 |
Creators | Campos Valenzuela, Jaime Alberto |
Contributors | Kaderali, Lars, Schröck, Evelin, Technische Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0022 seconds