Low-resource Neural Machine Translation with Large Language Models: A Continuous Self-Improving System

Low-resource Neural Machine Translation with Large Language Models: A Continuous Self-Improving System

ACL ARR 2025 February Submission5274 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Machine translation systems often struggle with maintaining quality in low-resource scenarios, due to the lack of sufficient parallel data. We present a novel learning framework that continuously (potentially life-long) improves Large Language Model (LLM)'s performance for low-resource languages machine translation through self-optimization. Our system comprises three key components: an Instruction Optimizer that dynamically refines translation prompts based on failure cases, a Demonstration Manager that intelligently selects relevant examples for in-context learning, and a Quality Estimator using multiple metrics that evaluates and arrange translations for Instruction Optimizer and Demonstration Manager. The resulting system, called DAIL-translation, boosts the performance in low-resource machine translation of moderate-sized LLMs ($\sim$7B), larger-scale LLMs ($\sim$70B) and OpenAI model series, with around 1k parallel data pairs or even monolingual English sentences.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: few-shot/zero-shot MT, continual learning, low-resource languages

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English, Friulian, Ligurian, Lombard, Bhojpuri, Chhattisgarhi

Submission Number: 5274

Loading