GuiLoMo: Allocating Experts and Ranks for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors

GuiLoMo: Allocating Experts and Ranks for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors

ACL ARR 2025 May Submission1108 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Parameter-efficient fine-tuning approaches, such as Low-Rank Adaptation (LoRA), have been shown to improve training efficiency in Large Language Models (LLMs). Given the limited number of parameters in LoRA, recent research explores integrating LoRA with Mixture-of-Experts (MoE) to enhance performance across diverse tasks. However, recognizing the need for more flexible downstream task adaptation, a predetermined inherent number of experts per layer and an inherent rank for each expert may not be an ideal setting. This finding highlights the necessity of obtaining the optimal Mixture of LoRA Experts (LoRA-MoE) architecture, involving the number of experts in each layer and the rank of each LoRA expert. In this paper, we introduce Differentiable ARchiTecture Search (DARTS) with scaling mask to find a fine-grained allocation strategy. Our analysis reveals that different LoRA fine-tuning model architectures impact performance across different tasks. Based on this, we introduce Adaptively Adjust Mixture of LoRA Experts (ALoRA-MoE), an expert-wise allocation strategy for adaptive optimization in achieving the optimal architecture across various downstream tasks. Experiments on three models across nine language processing and reasoning benchmarks demonstrate that ALoRA-MoE achieves comparable or superior performance over all baselines.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: parameter-efficient-training, LLM Efficiency

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Keywords: parameter-efficient-training, LLM Efficiency

Submission Number: 1108

Loading