FURINA: Free from Unmergeable Router via lINear Aggregation of mixed experts

15 Sept 2025 (modified: 20 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: parameter-efficient fine-tuning, low-rank adaptation, Mixture-of-Expert
TL;DR: In this work, we propose FURINA, enables router-free, mergeable MoE-LoRA fine-tuning with zero additional inference overhead, matching MoE-LoRA performance without introducing implementation complexity.
Abstract: The Mixture of Experts (MoE) paradigm has been successfully integrated into Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning (PEFT), delivering performance gains with minimal parameter overhead. However, a key limitation of existing MoE-LoRA methods is their reliance on a discrete router, which prevents the integration of the MoE components into the backbone model. This results in persistent computational overhead and increased system complexity during inference. To overcome this, we propose **FURINA**, a novel **F**ree from **U**nmergeable **R**outer framework based on the L**IN**ear **A**ggregation of experts. FURINA eliminates the router by introducing a Self-Routing mechanism. This is achieved through three core innovations: (1) decoupled learning of the direction and magnitude for LoRA adapters, (2) a shared learnable magnitude vector for consistent activation scaling, and (3) an expert selection loss that encourages divergent expert activation. The proposed mechanism leverages the angular similarity between the input and each adapter's directional component to activate experts, which are then scaled by the shared magnitude vector. This design allows the output norm to naturally reflect the importance of each expert, thereby enabling dynamic, router-free routing. The expert selection loss further sharpens this behavior by encouraging sparsity and aligning it with standard MoE activation patterns. A challenge that arises from Self-Routing is the potential diminishment of output norms, which could limit the overall model capacity. To mitigate this, we introduce a shared expert within the MoE-LoRA block that provides stable, foundational knowledge. To the best of our knowledge, FURINA is the first router-free, MoE-enhanced LoRA method that can be fully merged into the backbone model, introducing zero additional inference-time cost or complexity. Extensive experiments demonstrate that FURINA not only significantly outperforms standard LoRA but also matches or surpasses the performance of existing MoE-LoRA methods, while eliminating the extra inference-time overhead of MoE.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 5315
Loading