A Reproducibility Study of Counterfactual Explanations for Image Classification

TMLR Paper6293 Authors

23 Oct 2025 (modified: 02 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Counterfactual explanations have gained traction in recent years due to their contrastive and potentially actionable nature: moving an outcome from the original class to an alternative target class. Generating plausible and accurate counterfactuals remains challenging. We highlight two underexplored but critical factors influencing counterfactual quality for image classifiers: the neural network architecture and the chosen target class. This work presents a comprehensive empirical evaluation of multiple counterfactual explanation methods across diverse neural architectures and all possible target classes on the MNIST and CIFAR-10 datasets. Our results show that performance can vary substantially across architectures and targets, an aspect often overlooked in prior evaluations. To better assess counterfactual explanation plausibility, we introduce a novel evaluation method based on Moran’s index, a spatial autocorrelation metric. This allows us to systematically identify and exclude structurally implausible counterfactuals that existing metrics may overlook. We find that counterfactual explanation methods often fail to generate counterfactual explanations for intended target classes, due to factors such as timeouts, restrictive search spaces, or implementation issues. Furthermore, our analysis demonstrates that evaluating explanations on only one target class or architecture provides an incomplete and potentially misleading picture of performance. Additionally, we show that different plausibility metrics do not consistently agree, emphasising the need for more robust evaluation frameworks. In summary, we (i) identify architecture and target class as key overlooked dimensions in counterfactual explanation performance, (ii) propose a novel plausibility assessment method using Moran’s index, and (iii) provide actionable insights for the development and evaluation of more generalisable counterfactual explanation methods.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Dmitry_Kangin1
Submission Number: 6293
Loading