AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Yang Zhang; Yawei Li; Hannah Brown; Mina Rezaei; Bernd Bischl; Philip Torr; Ashkan Khakzar; Kenji Kawaguchi

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: feature attribution, explainable AI, explainability, synthetic data

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A controllable environment with synthetic model and data for faithfulness test and debugging of attribution methods

Abstract: Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as _faithfulness_, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use _all_ and _only_ these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by _explicitly designing the neural network_ by manually setting its weights, along with _designing data_, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in _AttributionLab_, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5542

Loading