With the desired deployment of Artificial Intelligence (AI), concerns over whether AI can “communicate” why it has made its decisions is of particular importance. In this thesis, we utilize predictive entropy (PE) as an surrogate for predictive uncertainty and report it for various test-time conditions that alter the testing distribution. This is done to evaluate the potential for PE to indicate when users should trust or dis- trust model predictions under dataset shift or out-of-distribution (OOD) conditions, two scenarios that are prevalent in real-world settings. Specifically, we trained an ensemble of three 2D-UNet architectures to segment synthetically damaged regions in fractional anisotropy scalar maps, a widely used diffusion metric to indicate mi- crostructural white-matter damage. Baseline ensemble statistics report that the true positive rate, false negative rate, false positive rate, true negative rate, Dice score, and precision are 0.91, 0.091, 0.23, 0.77, 0.85, and 0.80, respectively. Test-time PE was reported before and after the ensemble was exposed to increasing geometric distortions (OOD), adversarial examples (OOD), and decreasing signal-to-noise ratios (dataset shift). We observed that even though PE shows a strong negative correlation with model performance for increasing adversarial severity (ρAE = −1), this correlation is not seen under distortion or SNR conditions (ρD = −0.26, ρSNR = −0.30). However, the PE variability (PE-Std) between individual model predictions was shown to be a better indicator of uncertainty as strong negative correlations between model performance and PE-Std were seen during geometric distortions and adversarial ex- amples (ρD = −0.83, ρAE = −1). Unfortunately, PE fails to report large absolute uncertainties during these conditions, thus restricting the analysis to correlative relationships. Finally, determining an uncertainty threshold between “certain” and “uncertain” model predictions was seen to be heavily dependant on model calibra- tion. For augmentation conditions close to the training distribution, a single threshold could be hypothesized. However, caution must be taken if such a technique is clinically applied, as model miscalibration could nullify such a threshold for samples far from the distribution. To ensure that PE or PE-Std could be used more broadly for uncertainty estimation, further work must be completed. / Thesis / Master of Applied Science (MASc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/29567 |
Date | January 2021 |
Creators | McCrindle, Brian |
Contributors | Noseworthy, Michael, Doyle, Thomas E., Electrical and Computer Engineering |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0021 seconds