Reproduction study: Towards Transparent and Explainable Attention ModelsDownload PDF

31 Jan 2021 (modified: 05 May 2023)ML Reproducibility Challenge 2020 Blind SubmissionReaders: Everyone
Keywords: Reproduction, Transparent, Explainable, Attention, Models, LSTM, Diversity, Orthogonal, Natural Language Processing
Abstract: Reproducibility Summary Scope of Reproducibility Mohankumar et al. (2020) claim that current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model’s predictions in Natural Language Processing tasks. To make attention mechanisms more faithful and plausible, the authors propose two modified LSTM models with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse: the Orthogonal LSTM and the Diversity LSTM. The authors claim that the resulting attention distributions from these diversity-driven LSTMs offer more explainability and transparency in contrast to a Vanilla LSTM. Methodology The original code of the authors has been used. Data was retrieved from the links provided by the authors. A subset of the datasets used in the original paper were used, while maintaining the variety of different NLP tasks of the paper. The experiments were ran on the UvA Lisa cluster computer. Depending on the dataset, training and evaluation took between 1 and 40 hours. Additionally, the LIME framework was added to the pipeline to account for an extra experiment. Results Our results support the authors’ claims partially. Although we were not able to reproduce everything the authors claimed in their paper, there are still some signs that the proposed diversity-driven LSTMs could offer something extra in terms of explainability and transparency. What was easy The authors’ code was relatively easy to run. Their clear instructions for setting everything up and running experiments contributed to this. Some slight adaptations to the code had to be made to omit warnings, but this was a straightforward task. Their choice to automatically run and plot all experiments after each other was convenient for reproducing the work. What was difficult Setting up the environment in the remote GPU was slightly difficult. Also, the links for some datasets were malfunctioning or missing, making it impossible to verify all results. Running some experiments took quite a long time, but this was no major issue given the available computational resources we had on the Lisa GPU server. Communication with original authors There has been no contact with the orignal authors of the paper.
Paper Url: https://openreview.net/forum?id=ykG2B9bWiPXe&referrer=%5BML%20Reproducibility%20Challenge%202020%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2020)
4 Replies

Loading