Rank-efficient Mixture of Experts for LLM Finetuning

Francesco S. Carzaniga; Michael Hersche; David Daniel Cox; Abbas Rahimi

Rank-efficient Mixture of Experts for LLM Finetuning

Francesco S. Carzaniga, Michael Hersche, David Daniel Cox, Abbas Rahimi

20 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: rank-efficient, moe, llm, finetuning, mpo, lora

TL;DR: We propose two methods, SharedLoRA and OperA, to increase the rank-efficiency of PEFT MoEfication. We show that our methods outperform the SoTA across multiple models while always using less parameters, while surpassing it in effective rank as well.

Abstract: Large language models (LLMs) have achieved impressive results in many general-purpose domains, but their performance on specific tasks can still be improved through finetuning. Parameter-efficient finetuning (PEFT) aims to tailor an LLM to one or more tasks through a small amount of trainable parameters, requiring reduced computational resources. On the one hand, techniques like low-rank adaptation (LoRA) provide the required parameter efficiency with adapters of low, and fixed, rank, which also limits their flexibility. On the other hand, Mixture of experts (MoEs) enhance the flexibility of a model at the cost of an increased parameter count and computational budget. The combination of the two approaches, parameter-efficient MoEfication, has shown promise in addressing the issues of both. In this work, we propose two methods that improve the rank-efficiency of PEFT adapters, increasing the flexibility and reducing the number of parameters involved in MoEfication. First, SharedLoRA retains the additive nature of LoRA by using a two-tier structure of adapters, thereby increasing the effective rank of the adapter while also reducing its size. Second, OperA replaces additive with quantum-inspired multiplicative interactions to further drive rank efficiency upwards and number of parameters downwards. We show that both techniques match or surpass the state-of-the-art (SoTA) in its commonly used setup on 6 open-source frontier LLMs and 7 tasks, while using notably fewer parameters. Moreover, we also find that OperA is optimal given the same parameter budget for 5 out of 6 models considered, always using fewer parameters than the baseline. Finally, we provide evidence for the superior performance of our methods by analyzing the effective rank of the adapters. Here, our SharedLoRA nearly doubles the rank of the SoTA solution, while our OperA's rank is more than two orders of magnitude greater.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 24209

Loading