Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration

Thomas Decker; Volker Tresp; Florian Buettner

Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration

Thomas Decker, Volker Tresp, Florian Buettner

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable AI, Uncertainty Calibration, Perturbation-based Explanations

Abstract: Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models systematically produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines global and local explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved explanations while preserving their original predictions. Empirical evaluations across diverse models and datasets demonstrate that ReCalX consistently reduces perturbation-specific miscalibration most effectively while enhancing explanation robustness and the identification of globally important input features.

Supplementary Material: zip

Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 13021

Loading