Tackling the XAI Disagreement Problem with Adaptive Feature Grouping

ICLR 2026 Conference Submission19296 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Explainability, Disagreements, Functional Decomposition, Feature Groups
TL;DR: We consider feature as groups in order to increase agreement among post-hoc explainability methods.
Abstract: Post-hoc explanations aim at understanding which input features (or groups thereof) are the most impactful toward certain model decisions. Many such methods have been proposed (ArchAttribute, Occlusion, SHAP, RISE, LIME, Integrated Gradient) and it is hard for practitioners to understand the differences between them. Even worse, faithfulness metrics, often used to quantitatively compare explanation methods, also exhibit inconsistencies. To address these issues, recent work has unified explanation methods through the lens of Functional Decomposition. We extend such work to scenarios where input features are partitioned into groups (e.g. pixel patches) and prove that disagreements between explanation methods and faithfulness metrics are caused by between-group interactions. Crucially, getting rid of between-group interactions would lead to a single explanation that is optimal according to all faithfulness metrics. We finally show how to reduce the disagreements by adaptively grouping features/pixels on tabular/image data.
Primary Area: interpretability and explainable AI
Submission Number: 19296
Loading