Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation

Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation

ACL ARR 2025 February Submission6390 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have achieved impressive results across numerous NLP tasks, and fine-tuning them for Machine Translation (MT) has improved their performance. However, vanilla fine-tuning often leads to catastrophic forgetting, compromising the broad general abilities of LLMs and introducing potential security risks. These abilities, which are developed using proprietary and unavailable training data, make simple data replay methods ineffective. To overcome this issue, we propose a novel approach called $\textbf{RaDis}$ ($\textbf{Ra}$tionale $\textbf{Dis}$tillation). RaDis harnesses the strong generative capabilities of LLMs to create rationales for training data, which are then “replayed” to prevent forgetting. These rationales connect prior knowledge with new tasks, acting as self-distillation targets to regulate the training process. By jointly training on reference translations and self-generated rationales, the model can learn new translation skills while preserving its general abilities across other tasks. Additionally, RaDis provides a fresh perspective on using rationales in the CL field and has the potential to serve as a general continual learning method for a variety of tasks.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: large language model for MT, continual learning

Contribution Types: NLP engineering experiment

Languages Studied: English, Czech, German, Russian, Chinese

Submission Number: 6390

Loading