Gated Position-based Attention for Neural Machine TranslationDownload PDF

Anonymous

17 Jun 2023ACL ARR 2023 June Blind SubmissionReaders: Everyone
Abstract: Attention is a key component of modern neural machine translation architectures. Its effectiveness was attributed to its capability of modeling word dependencies based on their representation similarity. However, recent work shows that word dependency can be replaced with position dependency with only minor degradation. In this paper, we propose position-based attention as a variant of multi-head attention where the attention weights are computed from position representations. A naive replacement of token vectors with position vectors in self-attention results in a significant loss in translation quality, which can be recovered by using relative position representations and a gating mechanism. We show analytically that this gating mechanism introduces some form of word dependency and validate its effectiveness experimentally under various conditions. The resulting network, rPosNet, outperforms all existing position-based approaches and matches Transformer quality while requiring more than 20\% fewer attention parameters after training.
Paper Type: long
Research Area: Machine Translation
0 Replies

Loading