CART: Stability-Aware Evidence Selection for Cascade-Augmented Retrieval in Low-Resource Machine Translation

CART: Stability-Aware Evidence Selection for Cascade-Augmented Retrieval in Low-Resource Machine Translation

ACL ARR 2026 January Submission10805 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: low-resource machine translation, retrieval-augmented generation, stability-aware prompting, inference-time methods, generator-aware retrieval, indigenous languages

Abstract: Low-resource languages (LRLs) with complex morphosyntax and limited usable parallel data remain challenging for modern translation systems. Although large language models (LLMs) enable cross-lingual transfer, their translations for LRLs often exhibit unstable decoding under weak lexical grounding, leading to semantic drift and hallucination. We study this failure mode in Lakota, a morphologically rich Indigenous language lacking any standard MT system, and introduce CART, an inference-time Cascade-Augmented Retrieval Translation framework that mitigates instability without finetuning or language-specific analyzers. CART progressively constrains model behavior through input canonicalization, multi-channel retrieval, and stability-aware evidence selection, which filters retrieved examples based on whether they induce consistent short-form translations under minimal prompt perturbations rather than similarity alone. On a heterogeneous English–Lakota corpus of approximately 70k mixed-source pairs, CART improves EN→LKT BLEU by +5.8 over direct prompting and by +2.9 over similarity-based retrieval, with corresponding BERTScore gains of +0.07 and +0.03, respectively. These results suggest that generator-aware evidence selection can reduce hallucination and improve robustness in morphologically rich LRL settings, with Lakota serving as an illustrative case study.

Paper Type: Long

Research Area: Low-resource Methods for NLP

Research Area Keywords: multilingual MT, less-resourced languages, robustness, indigenous languages, endangered languages, retrieval-augmented generation,

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English, Lakota

Submission Number: 10805

Loading