Explaining Image Classifiers by Counterfactual Generation

Chun-Hao Chang, Elliot Creager, Anna Goldenberg, David Duvenaud

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: When a black-box classifier processes an input example to render a prediction, which input features are relevant and why? We propose to answer this question by efficiently marginalizing over the universe of plausible alternative values for a subset of features by conditioning a generative model of the input distribution on the remaining features. In contrast with recent approaches that compute alternative feature values ad-hoc---generating counterfactual inputs far from the natural data distribution---our model-agnostic method produces realistic explanations, generating plausible inputs that either preserve or alter the classification confidence. When applied to image classification, our method produces more compact and relevant per-feature saliency assignment, with fewer artifacts compared to previous methods.
  • Keywords: Explainability, Interpretability, Generative Models, Saliency Map, Machine Learning, Deep Learning
  • TL;DR: We compute saliency by using a strong generative model to efficiently marginalize over plausible alternative inputs, revealing concentrated pixel areas that preserve label information.
0 Replies