Attending From Foresight: A Novel Attention Mechanism for Neural Machine Translation

Xintong Li, Lemao Liu, Zhaopeng Tu, Guanlin Li, Shuming Shi, Max Q.-H. Meng

2021 (modified: 30 Nov 2021)IEEE ACM Trans. Audio Speech Lang. Process. 2021Readers: Everyone

Abstract: Machines translation (MT) is an essential task in natural language processing or even in artificial intelligence. Statistical machine translation has been the dominant approach to MT for decades, but recently neural machine translation achieves increasing interest because of its appealing model architecture and impressive translation performance. In neural machine translation, an attention model is used to identify the aligned source words for the next target word, i.e., target foresight word, to select translation context. However, it does not make use of any information about this target foresight word at all. Previous work proposed an approach to improve the attention model by explicitly accessing this target foresight word and demonstrating substantial alignment tasks. However, this approach cannot be applied in machine translation tasks where the target foresight word is unavailable. This paper proposes several novel enhanced attention models by introducing hidden information (such as part-of-speech) of the target foresight word for the translation task. We incorporate the novel enhanced attention employing hidden information about the target foresight word into both recurrent and self-attention-based neural translation models and theoretically justify that such hidden information can make translation prediction easier. Empirical experiments on four datasets further verify that the proposed attention models deliver significant improvements in translation quality.

0 Replies