Knowledge Distillation based Ensemble Learning for Neural Machine Translation

Chenze Shao; Meng Sun; Yang Feng; Zhongjun He; hua wu; Haifeng Wang

Knowledge Distillation based Ensemble Learning for Neural Machine Translation

Chenze Shao, Meng Sun, Yang Feng, Zhongjun He, hua wu, Haifeng Wang

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Knowledge Distillation, Ensemble Learning, Neural Machine Translation

Abstract: Model ensemble can effectively improve the accuracy of neural machine translation, which is accompanied by the cost of large computation and memory requirements. Additionally, model ensemble cannot combine the strengths of translation models with different decoding strategies since their translation probabilities cannot be directly aggregated. In this paper, we introduce an ensemble learning framework based on knowledge distillation to aggregate the knowledge of multiple teacher models into a single student model. Under this framework, we introduce word-level ensemble learning and sequence-level ensemble learning for neural machine translation, where sequence-level ensemble learning is capable of aggregating translation models with different decoding strategies. Experimental results on multiple translation tasks show that, by combining the two ensemble learning methods, our approach achieves substantial improvements over the competitive baseline systems and establishes a new single-model state-of-the-art BLEU score of 31.13 in the WMT14 English-German translation task.\footnote{We will release the source code and the created SEL training data for reproducibility.}

One-sentence Summary: We propose an ensemble learning method for NMT to aggregate the knowledge of multiple models into a single model.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): /references/pdf?id=FiOiVOa-4Y

10 Replies

Loading