Trustworthiness is a roadblock in mass adoption of artificial intelligence (AI) in medicine. This thesis developed a framework to explore the trustworthiness as it applies to AI in medicine with respect to common stakeholders in medical device development. Within this framework the element of explainability of AI models was explored by evaluating explainable AI (XAI) methods. In current literature a litany of XAI methods are available that provide a variety of insights into the learning and function of AI models. XAI methods provide a human readable output for the AI’s learning process. These XAI methods tend to be bespoke and provide very subjective outputs with varying degrees of quality. Currently, there are no metrics or methods of objectively evaluating XAI outputs against outputs from different types of XAI
methods. This thesis presents a set of constituent elements (similarity, stability and novelty) to explore the concept of explainability and then presents a series of metrics to evaluate those constituent elements. Thus providing a repeatable and testable framework to evaluate XAI methods and their generated explanations. This is accomplished using subject matter expert (SME) annotated ECG signals (time-series
signals) represented as images to AI models and XAI methods. A small subset from all available XAI methods, Vanilla Saliency, SmoothGrad, GradCAM and GradCAM++ were used to generate XAI outputs for a VGG-16 based deep learning classification model. The framework provides insights about XAI method generated explanations for the AI and how closely that learning corresponds to SME decision making. It also objectively evaluates how closely explanations generated by any XAI method resemble outputs from other XAI methods. Lastly, the framework provides insights about possible novel learning done by the deep learning model beyond what was identified by the SMEs in their decision making. / Thesis / Master of Applied Science (MASc) / The goal of this thesis was to develop a framework of how trustworthiness can be
improved for a variety of stakeholders in the use of AI in medical applications. Trust
was broken down into basic elements (Explainability, Verifiability, Fairness & Ro-
bustness) and ’Explainability’ was further explored. This was done by determining
how explainability (offered by XAI methods) can address the needs (Accuracy, Safety,
and Performance) of stakeholders and how those needs can be evaluated. Methods of
comparison (similarity, stability, and novelty) were developed that allow an objective
evaluation of the explanations from various XAI methods using repeatable metrics
(Jaccard, Hamming, Pearson Correlation, and TF-IDF). Combining the results of
these measurements into the framework of trust, work towards improving AI trust-
worthiness and provides a way to evaluate and compare the utility of explanations.
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/27787 |
Date | 11 1900 |
Creators | Siddiqui, Mohammad Kashif |
Contributors | Doyle, Thomas, Biomedical Engineering |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.003 seconds