Do Feature Attribution Methods Correctly Attribute Features?

Yilun Zhou; Serena Booth; Marco Tulio Ribeiro; Julie Shah

Do Feature Attribution Methods Correctly Attribute Features?

Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, Julie Shah

Published: 17 Oct 2021, Last Modified: 06 Apr 2025XAI 4 Debugging Workshop @ NEURIPS 2021 OralReaders: Everyone

Keywords: interpretability, explainability, feature attribution, evaluation of interpretability

TL;DR: We "unit test" several popular feature attribution algorithms for CV and NLP models against known highly important features to see if they can identify them; (un)surprisingly, they mostly can't.

Abstract: Feature attribution methods are exceedingly popular in interpretable machine learning. They aim to compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation. The lack of ground truth for feature attribution particularly complicates evaluation; to address this, we propose a dataset modification procedure where we construct attribution ground truth. Using this procedure, we evaluate three common interpretability methods: saliency maps, rationales, and attention. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods in the wild. Our evaluation approach is model-agnostic and can be used to assess future feature attribution method proposals as well. Code is available at https://github.com/YilunZhou/feature-attribution-evaluation.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/do-feature-attribution-methods-correctly/code)

0 Replies

Loading