Abstract: Scope of Reproducibility
Learning to Deceive with Attention-Based Explanations by Pruthi et al. makes two claims which we reproduce in this work. Their first claim entails that attention weights can be manipulated to shift mass away from a predefined set of impermissible tokens without significant chance in performance. Their second claim entails that these manipulated models still rely on information from the set of impermissible tokens. In this reproducibility report we argue that by running their experiments, we can reproduce the results which support the claims by Pruthi et al..
Methodology
In this report we run three of the classification tasks used by Pruthi et al., as the Reference Letters dataset is not publicly available. We extended this part by adding the multiclass version of the sentiment classification dataset. We evaluated these tasks with a Embedding model a BiLSTM and a BERT. In addition, we also run the experiments using the four sequence-to-sequence tasks with a GRU.
Results
Most of the results were reproducible with a difference of less than 1% in accuracy of the original paper. Therefore, we argue that the claims made by Pruthi et al. could be supported by this report.
What was easy
The authors open-sourced their code and responded quickly to questions by e-mail. Furthermore, the experiments can be run in a reasonable amount of time on a GPU.
What was difficult
Although the code was open-sourced, the code for some experiments was missing, and for certain experiments instructions for running the code were missing. Furthermore, the dataset for one of the classification tasks, namely the Reference Letters dataset, is not publicly available.
Communication with original authors
We had brief contact with the original authors asking for missing code. They responded quickly and added missing code which they provided.
Paper Url: https://openreview.net/forum?id=ZVxchkVPa8S¬eId=W0rq7riyE5f
3 Replies
Loading