Expert Margin Optimization: Enhancing Multi-Domain Translation Capabilities of LLM with MoE-LoRA

ACL ARR 2024 June Submission4755 Authors

16 Jun 2024 (modified: 17 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In the realm of machine translation utilizing Large Language Models (LLMs), the standard workflow involves Cross-Lingual Alignment learning followed by Instruction-tuning. Low-Rank Adaptation (LoRA) has been a widely-used and effective method for fine-tuning LLMs. However, LoRA alone exhibits limited benefits when confronted with multi-task or multi-domain scenarios. Given the prevalent existence of multi-domain challenges in machine translation, this paper focuses on enhancing the Multi-Domain Translation Capabilities of LLMs. We extend LoRA to Mixture of Experts (MoE) architecture, defined as MoE-LoRA, to address domain conflicts in multi-domain settings. Our approach involves introducing MoE-LoRA solely at higher layers to target specific domain-related knowledge acquisition, preceded by General Cross-Lingual Alignment during the training process. Particularly, we propose a methodology called Expert Margin Optimization to facilitate the transfer of additional knowledge from other domains to enhance the inputs specific to a domain. Experimental validations conducted on the English-to-German and English-to-Chinese translation directions using the Llama2-7B and Llama3-8B models demonstrate consistent improvements in BLEU and COMET scores, highlighting the efficacy of our proposed approach.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: Machine Translation, Large Language Model, LoRA, MoE, Margin Loss
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: Chinese, English, German
Submission Number: 4755
Loading