Multilingual \textit{k}-Nearest-Neighbor Machine Translation

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Short Paper
Submission Track: Machine Translation
Submission Track 2: Multilinguality and Linguistic Diversity
Keywords: multilingual machine translation, semi-parametric, kNN-MT
Abstract: \textit{k}-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages into a single datastore. Our results consistently demonstrate substantial improvements not only in low-resource translation quality (up to $+3.6$ BLEU), but also for high-resource translation quality (up to $+0.5$ BLEU). Our experiments show that it is possible to create multilingual datastores that are a quarter of the size, achieving a 5.3x speed improvement, by using linguistic similarities for datastore creation.\footnote{We will release our code upon acceptance.}
Submission Number: 1640
Loading