XLIT: A Method to Bridge Task Discrepancy in Machine Translation Pre-training

Published: 2024, Last Modified: 23 Jan 2026ACM Trans. Asian Low Resour. Lang. Inf. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Transfer learning from pre-trained language models to encoder-decoder translation models faces a challenge due to the mismatch between the tasks of pre-training and fine-tuning. Pre-trained models are not explicitly trained to understand the semantic interactions between different languages. To address this issue, a cross-lingual embedding space is used as an interface during the pre-training phase. This approach enables the decoder inputs to attend to the encoder outputs, similar to the fine-tuning process. However, the effectiveness of this transfer heavily relies on the quality of the pre-trained unsupervised cross-lingual embeddings, which introduces complexity and reduces reproducibility. In this study, we propose a pre-training method called Cross-lingual Interaction Transfer (XLIT), which does not depend on other embedding techniques. XLIT effectively reconciles the task discrepancy in machine translation fine-tuning. We conducted extensive experiments involving four low-resource and six very low-resource translation directions. The results of our experiments demonstrate that our method surpasses randomly initialized models and previous pre-training techniques by up to 9.4 BLEU. Furthermore, we demonstrate that our method achieves comparable performance when pre-trained with large-scale monolingual data from various languages.
Loading