The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan; Ankur Bapna; Orhan Firat; Roee Aharoni; Melvin Johnson; Wolfgang Macherey

The Missing Ingredient in Zero-Shot Neural Machine Translation

Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Multilingual Neural Machine Translation (NMT) systems are capable of translating between multiple source and target languages within a single system. An important indicator of generalization within these systems is the quality of zero-shot translation - translating between language pairs that the system has never seen during training. However, until now, the zero-shot performance of multilingual models has lagged far behind the quality that can be achieved by using a two step translation process that pivots through an intermediate language (usually English). In this work, we diagnose why multilingual models under-perform in zero shot settings. We propose explicit language invariance losses that guide an NMT encoder towards learning language agnostic representations. Our proposed strategies significantly improve zero-shot translation performance on WMT English-French-German and on the IWSLT 2017 shared task, and for the first time, match the performance of pivoting approaches while maintaining performance on supervised directions.

Keywords: Machine Translation, Multi-lingual processing, Zero-Shot translation

TL;DR: Simple similarity constraints on top of multilingual NMT enables high quality translation between unseen language pairs for the first time.

10 Replies

Loading