ERS*: A Bounded, Attribution-Agnostic Metric for Explainable Robustness in Image Recognition

ERS*: A Bounded, Attribution-Agnostic Metric for Explainable Robustness in Image Recognition

ICLR 2026 Conference Submission25011 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable Robustness Score, attribution stability, saliency maps, Grad-CAM, EigenCAM, attention rollout, LRP, RISE, ensemble attribution, Vision Transformer, Swin Transformer, ResNet-50, traffic sign recognition, physical perturbations, natural corruptions, CIFAR-C, ImageNet-C, autonomous driving, post-hoc evaluation, bounded metric

TL;DR: ERS* is a bounded, attribution-agnostic metric that combines performance degradation and explanation stability to expose when vision models and ensembles, stay accurate but reason inconsistently under real-world perturbations.

Abstract: Deep vision models can remain accurate under perturbations while shifting their internal reasoning, which is risky for safety-critical use. We introduce ERS*, a bounded metric (in [0,1]) for explainable robustness that jointly scores (i) normalized performance degradation and (ii) explanation stability between clean and perturbed inputs. Stability is computed across multiple attribution families (Grad-CAM/EigenCAM, attention rollout, LRP, RISE), and we define an ensemble-level attribution via probability-weighted fusion to evaluate ensembles directly. We study ViT-B/16, Swin-T, ResNet-50, and their soft-voting ensemble on a traffic-sign benchmark with ten calibrated physical perturbation suites (fading, dirt splatter, scratches, peeling/rust, etc.), and further demonstrate generality on natural corruption benchmarks beyond traffic signs (CIFAR-C, ImageNet-C). ERS* reveals cases where accuracy stays high but explanations become unstable, with ensembles sometimes achieving strong accuracy yet lower explanation stability than expected. Sensitivity analyses show ERS* rankings are stable across weight choices and attribution methods, and localization metrics plus a small human study indicate that higher ERS* aligns with perceived explanation quality. ERS* complements accuracy and standard robustness metrics (e.g., robust accuracy under corruption) by diagnosing explanation stability, providing a practical post-hoc tool for evaluating reliability and explainability in image recognition.

Primary Area: interpretability and explainable AI

Submission Number: 25011

Loading