Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

Yuxian Meng; Xiangyuan Ren; Zijun Sun; Xiaoya Li; Arianna Yuan; Fei Wu; Jiwei Li

Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

Yuxian Meng, Xiangyuan Ren, Zijun Sun, Xiaoya Li, Arianna Yuan, Fei Wu, Jiwei Li

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/large-scale-pretraining-for-neural-machine/code)

Original Pdf: pdf

11 Replies

Loading