A Robust Perturbation Strategy and Evaluation Benchmark for Accurate SHAP Attribution in Vision Models

Tim Dieter Eberhardt; Antonio Vico; Tin Stribor Sohn; Tim Alexander Bader; Wilhelm Stork

A Robust Perturbation Strategy and Evaluation Benchmark for Accurate SHAP Attribution in Vision Models

Tim Dieter Eberhardt, Antonio Vico, Tin Stribor Sohn, Tim Alexander Bader, Wilhelm Stork

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI, SHAP, Saliency Attribution, Perturbation Strategy, Computer Vision, Feature Attribution, Robustness, Ground Truth Benchmark, Vision Models, Model Interpretability

TL;DR: We propose a robust perturbation strategy and the first vision benchmark with pixel-level Shapley ground truth to enable accurate and reliable saliency evaluation in vision models.

Abstract: Understanding the decisions of vision models is essential for transparency and trust. Shapley values provide a principled approach to feature attribution, yet their application to vision is hindered by perturbation strategies that either fail to exclude information or introduce artifacts. We propose the Mean-Distance Perturbation (MDP) strategy, a simple and deterministic method that replaces image regions with maximally dissimilar colors in RGB or grayscale space. Unlike blurring, inpainting, or uniform fills, MDP offers a safer perturbation mechanism that more reliably suppresses class-relevant evidence. We further present SHAPEval, the first vision benchmark with analytically derived, pixel-level Shapley ground truth. Although intentionally simple, SHAPEval provides a reproducible baseline for controlled attribution evaluation and establishes a foundation for future benchmarks on more complex image data. Experiments on SHAPEval and natural datasets show that MDP consistently outperforms standard perturbation strategies across multiple models. Together, MDP and SHAPEval deliver a reproducible starting point for safer and more reliable explainability in computer vision. By exposing how perturbation choices alter model confidence and attribution alignment, our work directly links explanation quality to the robustness of learned representations.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Submission Number: 11235

Loading