Keywords: Distribution shift, Wasserstein distance, Counterfactual Robustness
TL;DR: We develop a Wasserstein-based framework that bounds data shifts and provides robust counterfactual explanations.
Abstract: Counterfactual explanations (CEs) are a powerful method for interpreting machine learning models, but CEs might be not valid when the model is updated due to distribution shifts in the underlying data. Existing approaches to robust CEs often impose explicit bounds on model parameters to ensure stability, but such bounds can be difficult to estimate and overly restrictive in practice. In this work, we propose a data shift-driven probabilistic framework for robust counterfactual explanations with plausible data shift modeling via a Wasserstein ball. We formalize a linearized Wasserstein perturbation scheme that captures realistic distributional changes which enables Monte Carlo estimation of CE robustness probabilities with domain-specific data shift tolerances. Theoretical analysis reveals that our framework is equivalent in spirit to model parameter bounding approaches but offers greater flexibility, avoids the need to estimate maximal model parameter shifts.
Experiments on real-world datasets demonstrate that the proposed method maintains high robustness of CEs under plausible distribution shifts, outperforming conventional parameter-bounding techniques in both validity and proximity costs.
Submission Number: 184
Loading