Keywords: Explainability, Disagreements, Functional Decomposition, Feature Groups
TL;DR: We consider feature as groups in order to increase agreement among post-hoc explainability methods.
Abstract: Post-hoc explanations aim at understanding which input features (or groups thereof) are the most impactful toward certain model decisions.
Many such methods have been proposed (ArchAttribute, Occlusion, SHAP, RISE, LIME, Integrated Gradient) and it is hard for practitioners
to understand the differences between them. Even worse, faithfulness metrics, often used to quantitatively compare explanation methods,
also exhibit inconsistencies. To address these issues, recent work has unified explanation methods
through the lens of Functional Decomposition. We extend such work to scenarios where input features are partitioned into groups
(e.g. pixel patches) and prove that disagreements between explanation methods and faithfulness metrics are caused by between-group
interactions. Crucially, getting rid of between-group interactions would lead to a single explanation that is optimal according to all faithfulness metrics. We finally show how to reduce the disagreements by adaptively grouping features/pixels on tabular/image data.
Primary Area: interpretability and explainable AI
Submission Number: 19296
Loading