Return to search

Biomarker Identification for Breast Cancer Types Using Feature Selection and Explainable AI Methods

This paper investigates the impact the LASSO, mRMR, SHAP, and Reinforcement Feature Selection techniques on random forest models for the breast cancer subtypes markers ER, HER2, PR, and TN as well as identifying a small subset of biomarkers that could potentially cause the disease and explain them using explainable AI techniques. This is important because in areas such as healthcare understanding why the model makes a specific decision is important it is a diagnostic of an individual which requires reliable AI. Another contribution is using feature selection methods to identify a small subset of biomarkers capable of predicting if a specific RNA sequence will have one of the cancer labels positive. The study begins by obtaining baseline accuracy metric using a random forest model on The Cancer Genome Atlas's breast cancer database to then explore the effects of feature selection, selecting different numbers of features, significantly influencing model accuracy, and selecting a small number of potential biomarkers that may produce a specific type of breast cancer. Once the biomarkers were selected, the explainable AI techniques SHAP and LIME were applied to the models and provided insight into influential biomarkers and their impact on predictions. The main results are that there are some shared biomarkers between some of the subsets that had high influence over the model prediction, LASSO and Reinforcement Feature selection sets scoring the highest accuracy of all sets and obtaining some insight into how the models used the features by using existing explainable AI methods SHAP and LIME to understand how these selected features are affecting the model's prediction.

Identiferoai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:honorstheses-2670
Date01 January 2023
CreatorsLa Rosa Giraud, David E
PublisherSTARS
Source SetsUniversity of Central Florida
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceHonors Undergraduate Theses

Page generated in 0.0022 seconds