Track: Regular Track (Page limit: 6-8 pages)
Keywords: Feature Attributions, Explanation Evaluation
Abstract: Feature attribution methods are widely used to explain machine learning models, yet their evaluation is challenging due to competing quality criteria such as faithfulness, robustness, and sparsity. These criteria often conflict, and even alternative formulations of the same metric can yield inconsistent conclusions. We address this by introducing a unifying framework that analyzes systematic incompatibilities between measures of explanation quality. Within this framework, we develop two novel mathematical tools: a sample-wise incompatibility index that quantifies systematic conflicts between criteria, and a generalized eigen-analysis that localizes where tradeoffs are concentrated within attribution results. Experiments on image classifiers show that this analysis provides insights beyond isolated metrics and complements current evaluation practices for feature attributions.
Supplementary Material: pdf
Submission Number: 16
Loading