AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Published: 27 Oct 2023, Last Modified: 29 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX
TL;DR: A controllable environment with synthetic model and data for faithfulness test and debugging of attribution methods
Abstract: Feature attribution explains neural network outputs by identifying relevant input features. How do we know if the identified features are indeed relevant to the network? This notion is referred to as _faithfulness_, an essential property that reflects the alignment between the identified (attributed) features and the features used by the model. One recent trend to test faithfulness is to design the data such that we know which input features are relevant to the label and then train a model on the designed data. Subsequently, the identified features are evaluated by comparing them with these designed ground truth features. However, this idea has the underlying assumption that the neural network learns to use _all_ and _only_ these designed features, while there is no guarantee that the learning process trains the network in this way. In this paper, we solve this missing link by _explicitly designing the neural network_ by manually setting its weights, along with _designing data_, so we know precisely which input features in the dataset are relevant to the designed network. Thus, we can test faithfulness in _AttributionLab_, our designed synthetic environment, which serves as a sanity check and is effective in filtering out attribution methods. If an attribution method is not faithful in a simple controlled environment, it can be unreliable in more complex scenarios. Furthermore, the AttributionLab environment serves as a laboratory for controlled experiments through which we can study feature attribution methods, identify issues, and suggest potential improvements.
Submission Track: Full Paper Track
Application Domain: Computer Vision
Survey Question 1: Our work proposes a controllable environment called AttributionLab that consist of synthetic models and synthetic datasets. AttributionLab provides another paradigm of sanity check for attribution methods and can serve as debugging environment. We apply AttributionLab to study behaviors of various attribution methods.
Survey Question 2: Through our work, we aim to test attribution methods that can provide faithful explainability for further usage.
Survey Question 3: We applied DeepSHAP, LIME, Integrated Gradients, Occlusion, GradCAM, Extremal Perturbation, Guided Backpropagation, and Information Bottleneck for Attribution (IBA) in our work.
Submission Number: 40