Distilling Estonian Text Domains for Production-Oriented Machine Translation

Elizaveta Korotkova; Mark Fishel

Distilling Estonian Text Domains for Production-Oriented Machine Translation

Elizaveta Korotkova, Mark Fishel

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone

Keywords: neural machine translation, knowledge distillation, multi-domain machine translation

TL;DR: We explore multi-domain knowledge distillation for NMT with very tiny student models on Estonian-English

Abstract: This paper explores knowledge distillation for multi-domain neural machine translation (NMT). We focus on the Estonian-English translation direction and experiment with distilling the knowledge of multiple domain-specific teacher models into a single student model that is tiny and efficient. Our experiments use a large parallel dataset of 18 million sentence pairs, consisting of 10 corpora, divided into 6 domain groups based on source similarity, and incorporate forward-translated monolingual data. Results show that tiny student models can cope with multiple domains even in case of large corpora, with different approaches benefiting frequent and low-resource domains.

Student Paper: Yes, the first author is a student

4 Replies

Loading