In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation

Armel Randy Zebaze, Benoît Sagot, Rachel Bawden

Published: 01 Jan 2025, Last Modified: 20 May 2025NAACL (Findings) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The ability of generative large language models (LLMs) to perform in-context learning has given rise to a large body of research into how best to prompt models for various natural language processing tasks. In this paper, we focus on machine translation (MT), a task that has been shown to benefit from in-context translation examples. However no systematic studies have been published on how best to select examples, and mixed results have been reported on the usefulness of similarity-based selection over random selection, although these results have mainly been shown for high-resource languages only. We provide a study covering multiple LLMs and in-context example retrieval strategies. Contrarily to previously published results, we find that retrieval based on sentence embedding similarity can improve MT, especially for low-resource language directions, and we also discuss the balance between selection pool diversity and quality. Code and outputs will be made freely available.