Understanding Attention Training via Output Relevance

Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein

16 Aug 2020 (modified: 16 Sept 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Keywords: NLP, ML, Deep learning, training dynamics, attention

TL;DR: We provide a framework to characterize how model learns attention

Abstract: In recurrent models with attention, the learned attention weights sometimes correlate with individual token importance, even though the training objective does not explicitly reward this. To understand why, we study the training dynamics of attention for sequence classification and translation. We identify a quantity in the model, which we call the \emph{output relevance}, and show that it drives the learning of the attention. If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned. Using output relevance, we explain why attention correlates with gradient-based interpretation; and perhaps surprisingly, a Seq2Seq with attention model sometimes fails to learn a simple permutation copying task. Finally, we discuss evidence that multi-head attention improves not only expressiveness but also learning dynamics.

0 Replies