Improving Causal Explanations
Keywords: causality, xai, explanation, interpretability, feature attribution
TL;DR: We propose a causation-agnostic framework for evaluating causal attributions with four desiderata from philosophy and cognitive science, and introduce counterfactual Shapley values that satisfy them and outperform prior methods empirically.
Abstract: Answering "Why did this outcome occur?" is central to the empirical science and explainable artificial intelligence (XAI).
However, XAI lacks a principled framework for evaluating explanation methods.
Following Lewis's view of explanations as summaries of causal histories distinct from causation itself, we formalize four desiderata agnostic to the precise definition of causation: causal admissibility (non-causes receive zero attribution), explanatory power (causes receive non-zero attribution), normality (attribution proportional to baseline normality), and effect responsivity (attribution proportional to effect magnitude).
We establish conditions for inferring desiderata violations on unknown causation oracles from known causation proxies.
We introduce counterfactual Shapley values ($L_3$ SVs), extending unit-level total effects with principled baseline selection.
We prove $L_3$ SVs uniquely satisfy all desiderata under functional dependence as a causation proxy and provide a sound bounding algorithm.
Experiments demonstrate that $L_3$ SVs are the first method to satisfy all desiderata on the proxy and correctly discriminate causal structures where existing methods fail.
Primary Area: interpretability and explainable AI
Submission Number: 25591
Loading