Seeing Through Distractions: Stable Attribution via the Core

Published: 07 Jun 2026, Last Modified: 07 Jun 2026ICML 2026 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Coalitional game theory, least-core, Shapley values, Explainable AI, Computer vision
Abstract: Shapley-value-based explanations are a standard approach to feature attribution in machine learning. Yet, in many realistic settings, particularly in computer vision, groups of features can act as spurious contextual cues and bias a classifier toward a label. In such cases, Shapley values may systematically overestimate the importance of these groups. We formalize this effect through the notion of a _contextual distractor_. We show that, for a broad family of non-convex cooperative games, the least-core assigns more appropriate attribution to such distractor groups than general semivalue-based methods, including Shapley, Banzhaf, and weighted variants thereof. We derive explicit conditions under which this gap emerges, thereby identifying regimes in which the least-core provides a stable explanation while averaging-based attributions can be misleading. We complement our theory with experiments on an image-classification task, where the assumptions are verified empirically and the observed behavior aligns with our theoretical predictions.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Paper Type: Standard paper
Submission Number: 32
Loading