Pixel-level Certified Explanations via Randomized Smoothing

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a pixel-level certification framework for any black-box attribution method using Randomized Smoothing, ensuring robustness against $\ell_2$-bounded noise with new robustness and localization metrics
Abstract: Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. However, these explanations are highly non-robust: small, imperceptible input perturbations can drastically alter the attribution map while maintaining the same prediction. This vulnerability undermines their trustworthiness and calls for rigorous robustness guarantees of pixel-level attribution scores. We introduce the first certification framework that guarantees pixel-level robustness for any black-box attribution method using randomized smoothing. By sparsifying and smoothing attribution maps, we reformulate the task as a segmentation problem and certify each pixel's importance against $\ell_2$-bounded perturbations. We further propose three evaluation metrics to assess certified robustness, localization, and faithfulness. An extensive evaluation of 12 attribution methods across 5 ImageNet models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks. Our code is at [https://github.com/AlaaAnani/certified-attributions](https://github.com/AlaaAnani/certified-attributions).
Lay Summary: When AI models make decisions, like identifying objects in images, we often try to understand why by looking at which parts of the image influenced the prediction. But these explanations can be unreliable: even tiny, invisible changes to the image can completely change what the AI says is important, even though its answer doesn’t change. We’ve developed a new method that makes these explanations much more stable and trustworthy. It works with any existing explanation technique and shows which parts of an image truly matter, even if the image is slightly altered. We also created new ways to measure how reliable and useful these explanations are. Our tests on many AI models show that our approach makes AI explanations clearer, more consistent, and safer to use in real applications.
Link To Code: https://github.com/AlaaAnani/certified-attributions
Primary Area: Deep Learning->Robustness
Keywords: explainability robustness, robustness certification, explainability, certification, robustness, certified attributions
Submission Number: 11648
Loading