A Dual-Perspective Approach to Evaluating Feature Attribution Methods

TMLR Paper2370 Authors

12 Mar 2024 (modified: 17 Mar 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model’s behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. To address the limitations of previous evaluations, in this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Dumitru_Erhan1
Submission Number: 2370
Loading