The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

Sep 27, 2018 (modified: Nov 16, 2018) ICLR 2019 Conference Withdrawn Submission readers: everyone
  • Abstract: Multilingual Neural Machine Translation (NMT) systems are capable of translating between multiple source and target languages within a single system. An important indicator of generalization within these systems is the quality of zero-shot translation - translating between language pairs that the system has never seen during training. However, until now, the zero-shot performance of multilingual models has lagged far behind the quality that can be achieved by using a two step translation process that pivots through an intermediate language (usually English). In this work, we diagnose why multilingual models under-perform in zero shot settings. We propose explicit language invariance losses that guide an NMT encoder towards learning language agnostic representations. Our proposed strategies significantly improve zero-shot translation performance on WMT English-French-German and on the IWSLT 2017 shared task, and for the first time, match the performance of pivoting approaches while maintaining performance on supervised directions.
  • Keywords: Machine Translation, Multi-lingual processing, Zero-Shot translation
  • TL;DR: Simple similarity constraints on top of multilingual NMT enables high quality translation between unseen language pairs for the first time.
0 Replies