Keywords: Multilingual Neural Machine Translation, shortcut learning, generalization
TL;DR: Single centric MNMT suffers from off-target issues due to overfitting of shortcut patterns of language mappings. Multilingual pretraining aggregates such overfitting. We propose a simple training strategy to eliminate such shortcut patterns.
Abstract: In this study, we connect the commonly-cited off-target issues in zero-shot translation to the usage of a single centric language in the training datasets of multilingual neural machine translation (MNMT). By carefully designing experiments on different MNMT scenarios and models, we attribute off-target issues to the overfitting of the shortcut patterns of (non-centric, centric) language mappings. Specifically, the learned shortcut patterns biases MNMT to mistakenly translate non-centric languages into the centric language instead of the expected non-centric language. We analyze the learning dynamics of MNMT and find that the shortcut learning generally occurs at the later stage of model training. Pretraining accelerates and aggravates the shortcut learning via a fast transformation from the copy pattern embedded in the pretraining intitialization to the (non-centric, centric) mapping pattern embedded in the MNMT data. Based on these observations, we propose a simple and effective training strategy to eliminate the shortcut patterns in MNMT models by leveraging the forgetting nature of model training. The only difference between our approach and the conventional training is that we only present the training examples of (centric, non-centric) language mapping (excluding the reverse direction) to MNMT models in the later stage of model training. Without introducing any additional data and computational costs, our approach can consistently and significantly improve the performance of zero-shot translation by alleviating the shortcut learning, and maintain the performance of supervised translation for different MNMT models on several benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)