Towards Attributions of Input Variables in a Coalition

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper proves the internal mechanism for the conflict of attributions computed under different partitions of input variables.
Abstract: This paper focuses on the fundamental challenge of partitioning input variables in attribution methods for Explainable AI, particularly in Shapley value-based approaches. Previous methods always compute attributions given a predefined partition but lack theoretical guidance on how to form meaningful variable partitions. We identify that attribution conflicts arise when the attribution of a coalition differs from the sum of its individual variables' attributions. To address this, we analyze the numerical effects of AND-OR interactions in AI models and extend the Shapley value to a new attribution metric for variable coalitions. Our theoretical findings reveal that specific interactions cause attribution conflicts, and we propose three metrics to evaluate coalition faithfulness. Experiments on synthetic data, NLP, image classification, and the game of Go validate our approach, demonstrating consistency with human intuition and practical applicability.
Lay Summary: We want to make AI models easier to understand by explaining why they make certain decisions — for example, why a model thinks a photo shows a dog rather than a cat. One popular method is to see how much each part of the input (like each word or pixel) contributes to the final decision. But there's a problem: when we group inputs together, the total contribution often doesn't match what you'd expect from the individual parts, which leads to confusion. In this work, we explore why this mismatch happens and how different kinds of logic inside AI models affect it. We build on a well-known method called the Shapley value and create a new way to fairly measure how groups of inputs (called coalitions) contribute to an outcome. We also introduce three ways to check whether these group contributions make sense. We tested our ideas on examples ranging from simple simulations to language tasks, image recognition, and even the game of Go. Our method gave results that aligned better with human intuition, and it could help researchers and developers build more trustworthy and interpretable AI systems.
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Attribution methods, Shapley value
Submission Number: 15947
Loading