Abstract: Transfer learning from pre-trained language models to encoder-decoder translation models faces a challenge due to the mismatch between the tasks of pre-training and fine-tuning. Pre-trained models are not explicitly trained to understand the semantic interactions between different languages. To address this issue, a cross-lingual embedding space is used as an interface during the pre-training phase. This approach enables the decoder inputs to attend to the encoder outputs, similar to the fine-tuning process. However, the effectiveness of this transfer heavily relies on the quality of the pre-trained unsupervised cross-lingual embeddings, which introduces complexity and reduces reproducibility. In this study, we propose a pre-training method called Cross-lingual Interaction Transfer (XLIT), which does not depend on other embedding techniques. XLIT effectively reconciles the task discrepancy in machine translation fine-tuning. We conducted extensive experiments involving four low-resource and six very low-resource translation directions. The results of our experiments demonstrate that our method surpasses randomly initialized models and previous pre-training techniques by up to 9.4 BLEU. Furthermore, we demonstrate that our method achieves comparable performance when pre-trained with large-scale monolingual data from various languages.
External IDs:dblp:journals/talip/PhamND24
Loading