Abstract: Generative Large Language Models (LLMs) and the associated pre-training \& fine-tuning paradigms have achieved significant advancements in various NLP tasks.
However, Multilingual Neural Machine Translation (MNMT) systems encounter capacity constraints when scaling to numerous languages with fixed model size, resulting in degraded translation quality, particularly for supervised tasks. Furthermore, the scarcity of parallel corpora for non-English language pairs limits expansion to new translation directions.
This paper presents CrossLoRA, a novel MNMT framework that combines Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) architecture featuring cross-connected language-specific experts. Our approach establishes dedicated experts for individual languages while enabling strategic interaction between source and target language experts during the translation process.
To achieve any-to-any translation capability, we tailor a two-staged fine-tuning paradigm for CrossLoRA framework with a self-contrastive semantic enhancement, fine-tuning using English as the pivot language, followed by pseudo-corpus generation and subsequent fine-tuning with the generated data.
Experimental results on multilingual translation datasets confirm the quality improvement and parameter efficiency of CrossLoRA framework.
Our findings provide an effective recipe for fine-tuning LLMs to achieve any-to-any translation capability.
Our code is available at: https://anonymous.4open.science/r/CrossL-3FBF/.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: efficient MT training, multilingual MT
Contribution Types: Approaches to low-resource settings
Languages Studied: Chinese, English, German, Russian, Czech, Icelandic
Keywords: multilingual machine translation, mixture-of-experts, parameter-efficient training, large language model
Submission Number: 4433
Loading