Treating cancer is difficult as the disease is complex and drug responses often depend on the patient's characteristics. Precision medicine aims to solve this by selecting individualized treatments. Since this involves the analysis of large datasets, machine learning can be used to make the drug selection process more efficient. Traditionally, such models utilize bulk gene expression data. However, this potentially masks information from small cell populations and fails to address tumor heterogeneity. Therefore, this thesis applies data deconvolution methods to bulk gene expression data and estimates the corresponding cell type-specific gene expression profiles. This "increases" the resolution of the input data for the drug response prediction. A hold-out dataset, LODOCV and LOCOCV were used for the evaluation of this approach. Furthermore, all results are compared against a baseline model, which was trained on bulk data. Overall, the accuracy of the cell type-specific model did not show an improvement compared to the bulk model. It also prioritizes information from bulk samples, which makes the additional data unnecessary. The robustness of the cell type-specific model is slightly lower than that of the bulk model. Note, that these outcomes are not necessarily due to a flaw in the underlying concept, but may be connected to poor deconvolution results as the same reference matrix was used for the deconvolution of all bulk samples regardless of the cancer type or disease.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-205016 |
Date | January 2024 |
Creators | Menacher, Lisa Maria |
Publisher | Linköpings universitet, Statistik och maskininlärning |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds