Simple Localized Counterfactuals for Visual Explanation

CVPR 2026 Workshop HOW Proceedings Track Submission20 Authors

Published: 21 Mar 2026, Last Modified: 03 Jun 2026HOW 2026EveryoneRevisionsBibTeXCC BY 4.0
Include In Proceedings: Yes, include in CVPR proceedings
Public: Yes,
Keywords: Explainability, Interpretability, Counterfactuals, Visual Counterfactuals, Sparse Autoencoders
TL;DR: We argue that for explanation, locality should take precedence over photorealism, and propose a lightweight alternative built on a simple auto-encoder, avoiding heavy generative architectures.
Abstract: Visual counterfactual explanations aim to reveal the features driving a model's decision by introducing minimal image changes that flip its prediction. For effective interpretation, these changes should be localized, yet most existing methods alter large image regions because they rely on generative models for photorealism. Although recent work introduces region constraints, such approaches remain complex and dependent on heavy backbones. We argue that for explanation, locality should take precedence over photorealism, and propose a lightweight alternative built on a simple auto-encoder, avoiding heavy generative architectures. This design naturally constrains locality while latent-space editing suppresses adversarial artifacts. On top of this, we introduce two components: aggregated gradients in latent space to further enhance locality, and a sparse concept objective to encourage semantically meaningful changes. Together, these yield valid, localized, and interpretable counterfactuals. Experiments on CelebA, CelebA-HQ, ImageNet, and CUB show that our approach produces faithful explanations that highlight decision-driving features and uncover novel discriminative traits.
PDF: pdf
Submission Number: 20
Loading