Abstract: Most state-of-the-art neural machine translation systems, despite being different
in architectural skeletons (e.g., recurrence, convolutional), share an indispensable
feature: the Attention. However, most existing attention methods are token-based
and ignore the importance of phrasal alignments, the key ingredient for the success
of phrase-based statistical machine translation. In this paper, we propose
novel phrase-based attention methods to model n-grams of tokens as attention
entities. We incorporate our phrase-based attentions into the recently proposed
Transformer network, and demonstrate that our approach yields improvements of
1.3 BLEU for English-to-German and 0.5 BLEU for German-to-English translation
tasks, and 1.75 and 1.35 BLEU points in English-to-Russian and Russian-to-English translation tasks
on WMT newstest2014 using WMT’16 training data.
Keywords: neural machine translation, natural language processing, attention, transformer, seq2seq, phrase-based, phrase, n-gram
TL;DR: Phrase-based attention mechanisms to assign attention on phrases, achieving token-to-phrase, phrase-to-token, phrase-to-phrase attention alignments, in addition to existing token-to-token attentions.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/phrase-based-attentions/code)
17 Replies
Loading