Here, we show that training with multiple noncommunicable diseases (NCDs) is both feasible and beneficial to modeling this class of diseases. We first use data from the Cancer Genome Atlas (TCGA) to train a pan cancer model, and then characterize the information the model has learned about the cancers. In doing this we show that the model has learned concepts that are relevant to the task of cancer classification. We also test the model on datasets derived independently of the TCGA cohort and show that the model is robust to data outside of its training distribution such as precancerous legions and metastatic samples. We then utilize the cancer model as the basis of a transfer learning study where we retrain it on other, non-cancer NCDs. In doing so we show that NCDs with very differing underlying biology contain extractible information relevant to each other allowing for a broader model of NCDs to be developed with existing datasets. We then test the importance of the samples source tissue in the model and find that the NCD class and tissue source may not be independent in our model. To address this, we use the tissue encodings to create augmented samples. We test how successfully we can use these augmented samples to remove or diminish tissue source importance to NCD class through retraining the model. In doing this we make key observations about the nature of concept importance and its usefulness in future neural network explainability efforts.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc2179319 |
Date | 07 1900 |
Creators | Gore, Steven Lee |
Contributors | Azad, Rajeev K., Padilla, Pamela, Mikler, Armin, Shulaev, Vladimir, Mittler, Ron |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | Text |
Rights | Public, Gore, Steven Lee, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved. |
Page generated in 0.002 seconds