Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation: Focusing on Obtaining Location-Agnostic Representations

Jiarui Zhang, Heyan Huang, Yue Hu, Ping Guo

Published: 01 Jan 2024, Last Modified: 15 May 2025ICANN (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.