PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning

ACL ARR 2026 January Submission10435 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Large Language Models, Parameter-Efficient Fine-Tuning, Mixture-of-Experts, Continual Learning
Abstract: Continual instruction tuning (CIT) requires multimodal large language models (MLLMs) to adapt to a stream of tasks without forgetting prior capabilities. A common strategy is to isolate updates by routing inputs to different LoRA experts. However, existing LoRA-based mixture-of-experts(MoE) methods often jointly update the router and experts in an indiscriminate way, causing the router’s preferences to co-drift with experts’ adaptation pathways and gradually deviate from early-stage input–expert specialization. We term this as \emph{\textbf{Misaligned Co-drift}}, which blurs expert responsibilities and exacerbates forgetting. To address this, we introduce the \emph{\textbf{pathway activation subspace (PASs)}}, a LoRA-induced subspace that reflects which low-rank pathway directions an input activates in each expert, providing a capability-aligned coordinate system for routing and preservation. Based on PASs, we propose a fixed-capacity PASs-based MoE–LoRA method with two components: PAS-guided Reweighting, which calibrates routing using each expert’s pathway activation signals, and PAS-aware Rank Stabilization, which selectively stabilizes rank directions important to previous tasks. Experiments on CIT benchmark show that our approach consistently outperforms a range of conventional continual learning baselines and MoE–LoRA variants in both accuracy and anti-forgetting without increasing model parameters. Source codes will be released upon acceptance.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodality, vision question answering, continual learning, parameter-efficient-training
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Chinese
Submission Number: 10435
Loading