Low-resource Neural Machine Translation with Large Language Models: A Continuous Self-Improving System
Abstract: Machine translation systems often struggle with maintaining quality in low-resource scenarios, due to the lack of sufficient parallel data. We present a novel learning framework that continuously (potentially life-long) improves Large Language Model (LLM)'s performance for low-resource languages machine translation through self-optimization. Our system comprises three key components: an Instruction Optimizer that dynamically refines translation prompts based on failure cases, a Demonstration Manager that intelligently selects relevant examples for in-context learning, and a Quality Estimator using multiple metrics that evaluates and arrange translations for Instruction Optimizer and Demonstration Manager. The resulting system, called DAIL-translation, boosts the performance in low-resource machine translation of moderate-sized LLMs ($\sim$7B), larger-scale LLMs ($\sim$70B) and OpenAI model series, with around 1k parallel data pairs or even monolingual English sentences.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: few-shot/zero-shot MT, continual learning, low-resource languages
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, Friulian, Ligurian, Lombard, Bhojpuri, Chhattisgarhi
Submission Number: 5274
Loading