Distilling Estonian Text Domains for Production-Oriented Machine TranslationDownload PDF

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: neural machine translation, knowledge distillation, multi-domain machine translation
TL;DR: We explore multi-domain knowledge distillation for NMT with very tiny student models on Estonian-English
Abstract: This paper explores knowledge distillation for multi-domain neural machine translation (NMT). We focus on the Estonian-English translation direction and experiment with distilling the knowledge of multiple domain-specific teacher models into a single student model that is tiny and efficient. Our experiments use a large parallel dataset of 18 million sentence pairs, consisting of 10 corpora, divided into 6 domain groups based on source similarity, and incorporate forward-translated monolingual data. Results show that tiny student models can cope with multiple domains even in case of large corpora, with different approaches benefiting frequent and low-resource domains.
Student Paper: Yes, the first author is a student
4 Replies

Loading