Return to search

Machine learning with the cancer genome atlas head and neck squamous cell carcinoma dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality

In recent years, more data is becoming available for historical oncology case analysis. A large dataset that describes over 500 patient cases of Head and Neck Squamous Cell Carcinoma is a potential goldmine for finding ways to improve oncological decision support. Unfortunately, the best approaches for finding useful inferences are unknown. With so much information, from DNA and RNA sequencing to clinical records, we must use computational learning to find associations and biomarkers.
The available data has sparsity, inconsistencies, and is very large for some datatypes. We processed clinical records with an expert oncologist and used complex modeling methods to substitute (impute) data for cases missing treatment information. We used machine learning algorithms to see if imputed data is useful for predicting patient survival. We saw no difference in ability to predict patient survival with the imputed data, though imputed treatment variables were more important to survival models.
To deal with the large number of features in RNA expression data, we used two approaches: using all the data with High Performance Computers, and transforming the data into a smaller set of features (sparse principal components, or SPCs). We compared the performance of survival models with both datasets and saw no differences. However, the SPC models trained more quickly while also allowing us to pinpoint the biological processes each SPC is involved in to inform future biomarker discovery.
We also examined ten processed molecular features for survival prediction ability and found some predictive power, though not enough to be clinically useful.

Identiferoai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-8375
Date01 May 2019
CreatorsRendleman, Michael
ContributorsCasavant, Thomas L.
PublisherUniversity of Iowa
Source SetsUniversity of Iowa
LanguageEnglish
Detected LanguageEnglish
Typethesis
Formatapplication/pdf
SourceTheses and Dissertations
RightsCopyright © 2019 Michael Rendleman

Page generated in 0.0016 seconds