Abstract: Neural Machine Translation (NMT) models are typically trained on heterogeneous data that are concatenated and randomly shuffled. Curriculum training aims to present the data to the NMT systems in a meaningful order. In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that consider prediction scores of the emerging NMT model. Through extensive experiments on six language pairs comprising low- and high-resource languages from WMT'21, we have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence (approximately 50% fewer updates).
0 Replies
Loading