Probabilistic Framework for Robustness of Counterfactual Explanations Under Data Shifts

Xuan Zhao; Lena Krieger; Zhuo Cao; Arya Bangun; Hanno Scharr; Ira Assent

Probabilistic Framework for Robustness of Counterfactual Explanations Under Data Shifts

Xuan Zhao, Lena Krieger, Zhuo Cao, Arya Bangun, Hanno Scharr, Ira Assent

Published: 29 Sept 2025, Last Modified: 24 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distribution shift, Wasserstein distance, Counterfactual Robustness

TL;DR: We develop a Wasserstein-based framework that bounds data shifts and provides robust counterfactual explanations.

Abstract: Counterfactual explanations (CEs) are a powerful method for interpreting machine learning models, but CEs might be not valid when the model is updated due to distribution shifts in the underlying data. Existing approaches to robust CEs often impose explicit bounds on model parameters to ensure stability, but such bounds can be difficult to estimate and overly restrictive in practice. In this work, we propose a data shift-driven probabilistic framework for robust counterfactual explanations with plausible data shift modeling via a Wasserstein ball. We formalize a linearized Wasserstein perturbation scheme that captures realistic distributional changes which enables Monte Carlo estimation of CE robustness probabilities with domain-specific data shift tolerances. Theoretical analysis reveals that our framework is equivalent in spirit to model parameter bounding approaches but offers greater flexibility, avoids the need to estimate maximal model parameter shifts. Experiments on real-world datasets demonstrate that the proposed method maintains high robustness of CEs under plausible distribution shifts, outperforming conventional parameter-bounding techniques in both validity and proximity costs.

Submission Number: 184

Loading