Return to search

Interpretability for Deep Learning Text Classifiers

The ubiquitous presence of automated decision-making systems that have a performance
comparable to humans brought attention towards the necessity of interpretability for the
generated predictions. Whether the goal is predicting the system’s behavior when the
input changes, building user trust, or expert assistance in improving the machine learning
methods, interpretability is paramount when the problem is not sufficiently validated in
real applications, and when unacceptable results lead to significant consequences.
While for humans, there are no standard interpretations for the decisions they make,
the complexity of the systems with advanced information-processing capacities conceals
the detailed explanations for individual predictions, encapsulating them under layers of
abstractions and complex mathematical operations. Interpretability for deep learning classifiers becomes, thus, a challenging research topic where the ambiguity of the problem
statement allows for multiple exploratory paths.
Our work focuses on generating natural language interpretations for individual predictions of deep learning text classifiers. We propose a framework for extracting and
identifying the phrases of the training corpus that influence the prediction confidence the
most through unsupervised key phrase extraction and neural predictions. We assess the
contribution margin that the added justification has when the deep learning model predicts
the class probability of a text instance, by introducing and defining a contribution metric
that allows one to quantify the fidelity of the explanation to the model. We assess both
the performance impact of the proposed approach on the classification task as quantitative
analysis and the quality of the generated justifications through extensive qualitative and
error analysis.
This methodology manages to capture the most influencing phrases of the training corpus as explanations that reveal the linguistic features used for individual test predictions,
allowing humans to predict the behavior of the deep learning classifier.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/41564
Date14 December 2020
CreatorsLucaci, Diana
ContributorsInkpen, Diana
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0026 seconds