Joint source-target encoding with pervasive attention

Maha Elbayad, Laurent Besacier, Jakob Verbeek

2021 (modified: 04 Oct 2022)Mach. Transl. 2021Readers: Everyone

Abstract: The pervasive attention model is a sequence-to-sequence model that addresses the issue of source–target interaction in encoder–decoder models by jointly encoding the two sequences with a two-dimensional convolutional neural network. We investigate different design choices for each building block of Pervasive Attention and study their impact to improve the predictive strength of the model. These include different types of layer connectivity, depth of the networks, the filter sizes, and source aggregation mechanisms. Machine translation experiments on the IWSLT’14 De $$\rightarrow$$ → En, IWSLT’15 En $$\rightarrow$$ → Vi, WMT’16 En $$\rightarrow$$ → Ro and WMT’15 De $$\rightarrow$$ → En datasets show results competitive with state-of-the-art encoder–decoder models, outperforming Transformer models on three of the four tested datasets.

0 Replies