Implementing and Evaluating Multi-source Retrieval-Augmented Translation

Tommi Nieminen, Jörg Tiedemann, Sami Virpioja

Published: 01 Nov 2025, Last Modified: 24 Mar 2026Proceedings of the Tenth Conference on Machine TranslationEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, neural machine translation (NMT) systems have been integrated with external databases with the aim of improving machine translation (MT) quality and enforcing domain-specific terminology and other conventions in the MT output. Most of the work in incorporating external knowledge with NMT has concentrated on integrating a single source of information, usually either a terminology database or a translation memory. However, in real-life translation scenarios, all relevant knowledge sources should be used in parallel. In this article, we evaluate different methods of integrating external knowledge from multiple sources in a single NMT system. In addition to training single models trained to utilize multiple kinds of information, we also ensemble models that have been trained to utilize a single type of information. We evaluate our models against state-of-the-art LLMs using an extensive purpose-built English to Finnish test suite.

External IDs:doi:10.18653/v1/2025.wmt-1.20