- TL;DR: We use a transformer encoder to do translation by training it in the style of a masked translation model.
- Abstract: We introduce the masked translation model (MTM) which combines encoding and decoding of sequences within the same model component. The MTM is based on the idea of masked language modeling and supports both autoregressive and non-autoregressive decoding strategies by simply changing the order of masking. In experiments on the WMT 2016 Romanian-English task, the MTM shows strong constant-time translation performance, beating all related approaches with comparable complexity. We also extensively compare various decoding strategies supported by the MTM, as well as several length modeling techniques and training settings.
- Keywords: Neural Machine Translation, Non-Autoregressive Decoding, Deep Learning, Transformer