Keywords: explainability, Feature Attribution, Explainable AI (XAI), Shapley Values, Local Interpretability, Model-Agnostic Methods
Abstract: Feature attribution methods based on Shapley values, such as the popular
SHAP framework, are built on strong axiomatic foundations but suffer
from a critical, previously underappreciated flaw: global dependence. As
recent impossibility theorems demonstrate, this vulnerability is not
merely an estimation issue but a fundamental one. The feature
attributions for a local instance can be arbitrarily manipulated by
modifying the model's behavior in regions of the feature space far from
that instance, rendering the resulting Shapley values semantically
unstable and potentially misleading.
This paper introduces VARSHAP, a novel feature attribution method that
directly solves this problem. We argue that the source of the flaw is
the characteristic function used in the Shapley game — the model's
output itself. VARSHAP redefines this game by using the reduction of
local prediction variance as the characteristic function. By doing so,
our method is, by construction, independent of the model's global
behavior and provides a truly local explanation. VARSHAP retains the
desirable axiomatic properties of the Shapley framework while ensuring
that the resulting attributions are robust and faithful to the model's
local decision landscape. Experiments on synthetic and real-world
datasets confirm our theoretical claims, showing that VARSHAP provides
stable explanations under global data shifts where standard methods fail
and demonstrates superior performance, particularly in robustness and
complexity metrics.
Primary Area: interpretability and explainable AI
Submission Number: 19556
Loading