Improving Chinese-Centric Low-Resource Translation Using English-Centric Pivoted Parallel Data

Xiaoying Wang, Lingling Mu, Hongfei Xu

14 Dec 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The good performance of Neural Machine Trans-lation (NMT) normally relies on a large amount of parallel data, while the bilingual data between languages are usually insufficient. mBART improves the performance of low-resource translation by pre-training on multilingual monolingual data and then fine-tuning on bilingual data, but does not leverage parallel data which contains crucial alignment information between languages. In this paper, we propose to use English-centric parallel data in a Multilingual NMT (MNMT) manner with English as the pivot, to provide translation and alignment information for the translation between Chinese and other languages. We conduct experiments on the CCMT 2023 low-resource machine translation task between Chinese and the languages among “the Belt and Road”. Our method improves the zh→vi, vi→zh, zh→mn, mn→zh, zh→cs and cs→zh tasks by +1.65, +0.24, +0.91, +3.47, +2.88, +6.35 BLEU respectively over the strong mBART baseline, showing the effectiveness of our approach and the importance of English-centric parallel data.

0 Replies