Keywords: Dictionary-Guided Prompting; Lexicon-Aligned Retrieval; Low-Resource Machine Translation; Terminology Grounding; Retrieval-Augmented Generation (RAG)
TL;DR: A dictionary-guided prompting method (LAP) improves low-resource machine translation, showing gains from Tangut→Chinese to 100-sentence probes in Inuktitut→English and Nahuatl→Spanish.
Abstract: We present Lexicon-Aligned Prompting (LAP), a general methodology that injects bilingual dictionary evidence into large language models (LLMs) for low-resource machine translation (LR-MT). LAP formally separates (i) lexicon–sentence retrieval, (ii) prompt integration. As a main experiment, we retain a Tangut→Chinese setting with strong literal alignment and idiomatic rewriting results, then add two tiny-data probe studies designed to test LAP’s portability under extreme data scarcity: Inuktitut→English and Nahuatl→Spanish. Each probe uses only 100 training sentences. Despite the tiny size, LAP consistently improves chrF and terminology accuracy in both zero-shot and lightweight fine-tuning regimes, with significance supported by paired bootstrap and sign tests. The results demonstrate that LAP offers a transparent, controllable, and reproducible way to ground LR-MT in human-curated lexical knowledge.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1549
Loading