How does Uncertainty Impact Explanation Coherence?Download PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: By introducing noise into test data, we show that increased uncertainty at test time does not necessarily imply explanation incoherence.
Abstract: Explainable AI methods facilitate the understanding of model behaviour. However, small, imperceptible perturbations to inputs can vastly distort explanations. As these explanations are typically evaluated holistically, before model deployment, it is difficult to assess when a particular explanation is trustworthy. In contrast, uncertainty is easily measured at inference time and in an unsupervised fashion. Some studies have tried to create confidence estimators for explanations, but none have investigated an existing link between uncertainty and explanation quality. We artificially simulate epistemic uncertainty in text input by introducing noise at inference time. In this large-scale empirical study, we insert different levels of noise in a myriad of ways and measure the effect on PLM output and uncertainty metrics. We find that uncertainty and explanation coherence have a task-dependant correlation which can be moderately positive and likely stems from noise exposed during the training process; this suggests that these models may be better at identifying salient tokens when uncertain, which can be used for human-AI collaboration. While this quality can be at odds with robustness to noise, Integrated Gradients typically shows good robustness yet relatively strong correlation to uncertainty given perturbed data. This suggests that uncertainty is not only an indicator of output reliability, but could also be a potential indicator of explanation coherence.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies

Loading