Variational saliency maps for explaining model's behavior

Jae Myung Kim; Eunji Kim; Seokhyeon Ha; Sungroh Yoon; Jungwoo Lee

Variational saliency maps for explaining model's behavior

Jae Myung Kim, Eunji Kim, Seokhyeon Ha, Sungroh Yoon, Jungwoo Lee

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Interpretability, XAI, Variational Inference

Abstract: Saliency maps have been widely used to explain the behavior of an image classifier. We introduce a new interpretability method which considers a saliency map as a random variable and aims to calculate the posterior distribution over the saliency map. The likelihood function is designed to measure the distance between the classifier's predictive probability of an image and that of locally perturbed image. For the prior distribution, we make attributions of adjacent pixels have a positive correlation. We use a variational approximation, and show that the approximate posterior is effective in explaining the classifier's behavior. It also has benefits of providing uncertainty over the explanation, giving auxiliary information to experts on how much the explanation is trustworthy.

One-sentence Summary: Posterior distribution of a saliency map that explains model's behavior and provides uncertainty over the explanation.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=4xAzX8TfD

8 Replies

Loading