MoSE: Decoupled Tuning for Forgetting-Resilient Multi-task Fine-tuning of LLMs

ICLR 2026 Conference Submission15792 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: (Large) Language Models;PEFT
TL;DR: We propose MoSE, a mixture-of-experts framework that separates shared and task-specific LoRA experts to improve multi-task and continual fine-tuning of large language models.
Abstract: Integrating Low-rank Adaptation (LoRA) and Mixture-of-Expert (MoE) is the mainstream for applying LLMs in multi-task scenarios. Existing works assume that different experts can share common knowledge and hold the specific information dynamically. They employ router to select appropriate experts for different tasks. Despite the achieved progress,most of existing works still face the problem of cataclysmic forgetting of both common and specific information since they tuning the LoRA modules indiscriminately. The learned information in the LoRA modules from previous task might be overwritten by the finetuning of subsequent tasks. To tackle this problem, in this paper, we propose a novel Mixture of Shared and Exclusive Experts framework (MoSE) for better multi-task fine-tuning of LLMs. Different from most existing works, we first separate the LoRA experts into routing experts for task-specific information and shared experts for common knowledge. For routing experts, we develop a feature-wise module to select the most appropriate experts and tuning their parameters entirely. For shared experts, we aim to maintain as much common knowledge as possible.Thus, we design a novel Top-k-selection tuning strategy to selectively finetune certain parameters of shared experts. Then, we adopt expert assignment strategies to mitigate task imbalance and ensure fair expert utilization. Finally, extensive experiments over diverse multi-task scenarios demonstrate the effectiveness of our proposed MoSE.Moreover, MoSE exhibits strong continual learning ability, effectively adapting to new tasks while retaining prior knowledge.Moreover,MoSE exhibits strong continual learning ability, effectively adapting to new tasks while retaining prior knowledge (average 3.3\% and 7.4\% improvement compared with advanced baselines in sequential continual learning).
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 15792
Loading