Abstract: Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model’s predic- tions. Attention distributions can be consid- ered a faithful explanation if a higher atten- tion weight implies a greater impact on the model’s prediction. They can be considered a plausible explanation if they provide a human- understandable justification for the model’s predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model’s predic- tions. We observe that in LSTM based en- coders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situa- tions do not carry much meaning because even a random permutation of the attention weights does not affect the model’s predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions of- ten attribute the model’s predictions to unim- portant words such as punctuation and fail to offer a plausible explanation for the predic- tions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training ob- jective that ensures that the hidden represen- tations learned at different time steps are di- verse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance rank- ing of the hidden states (ii) are better indica- tive of words important for the model’s predic- tions (iii) correlate better with gradient-based attribution methods. Human evaluations indi- cate that the attention distributions learned by our model offer a plausible explanation of the model’s predictions. Our code has been made publicly available at https://github.com/ akashkm99/Interpretable- Attention
0 Replies
Loading