Towards Unified Multi-Domain Machine Translation With Mixture of Domain ExpertsDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 21 Feb 2024IEEE ACM Trans. Audio Speech Lang. Process. 2023Readers: Everyone
Abstract: Multi-domain machine translation (MDMT) aims to construct models with mixed-domain training corpora to switch translation between different domains. Previous studies either assume that the domain information is given and leverage the domain knowledge to guide the translation process, or suppose that the domain information is unknown and utilize the model to automatically recognize it. However, the cases are mixed in practical scenarios, which means that some sentences are labeled with domain information while others are unlabeled, which is beyond the capacity of the previous methods. In this article, we propose a unified MDMT model with a mixture of sub-networks (experts) to address the cases with or without domain labels. The mixture of sub-networks in our MDMT model includes a shared expert and multiple domain-specific experts. For the inputs with domain labels, our MDMT model goes through the shared and the corresponding domain-specific experts. For the unlabeled inputs, our MDMT model activates all the experts, each of which makes a dynamic contribution. Experimental results on multiple diverse domains in De <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\rightarrow$</tex-math></inline-formula> En, Fr <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\rightarrow$</tex-math></inline-formula> En, and En <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\rightarrow$</tex-math></inline-formula> Ro demonstrate that our method can outperform the strong baselines in both scenarios with or without domain labels. Further analyses show that our model has good generalization ability when transferring into new domains.
0 Replies

Loading