- Abstract: Prior researches suggest that attentional neural machine translation (NMT) is able to capture word alignment by attention, however, to our surprise, it almost fails for NMT models with multiple attentional layers except for those with a single layer. This paper introduce two methods to induce word alignment from general neural machine translation models. Experiments verify that both methods obtain much better word alignment than the method by attention. Furthermore, based on one of the proposed method, we design a criterion to divide target words into two categories (i.e. those mostly contributed from source "CFS" words and the other words mostly contributed from target "CFT" words), and analyze word alignment under these two categories in depth. We find that although NMT models are difficult to capture word alignment for CFT words but these words do not sacrifice translation quality significantly, which provides an explanation why NMT is more successful for translation yet worse for word alignment compared to statistical machine translation. We further demonstrate that word alignment errors for CFS words are responsible for translation errors in some extent by measuring the correlation between word alignment and translation for several NMT systems.
- Keywords: Neural Machine Translation, Word Alignment, Neural Network, Pointwise Mutual Information
- TL;DR: It proposes methods to induce word alignment for neural machine translation (NMT) and uses them to interpret the relationship between NMT and word alignment.