GuiLoMo: Allocating Experts and Ranks for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors
Abstract: Parameter-efficient fine-tuning approaches, such as Low-Rank Adaptation (LoRA), have been shown to improve training efficiency in Large Language Models (LLMs). Given the limited number of parameters in LoRA, recent research explores integrating LoRA with Mixture-of-Experts (MoE) to enhance performance across diverse tasks. However, recognizing the need for more flexible downstream task adaptation, a predetermined inherent number of experts per layer and an inherent rank for each expert may not be an ideal setting. This finding highlights the necessity of obtaining the optimal Mixture of LoRA Experts (LoRA-MoE) architecture, involving the number of experts in each layer and the rank of each LoRA expert.
In this paper, we introduce Differentiable ARchiTecture Search (DARTS) with scaling mask to find a fine-grained allocation strategy.
Our analysis reveals that different LoRA fine-tuning model architectures impact performance across different tasks. Based on this, we introduce Adaptively Adjust Mixture of LoRA Experts (ALoRA-MoE), an expert-wise allocation strategy for adaptive optimization in achieving the optimal architecture across various downstream tasks. Experiments on three models across nine language processing and reasoning benchmarks demonstrate that ALoRA-MoE achieves comparable or superior performance over all baselines.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: parameter-efficient-training, LLM Efficiency
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Keywords: parameter-efficient-training, LLM Efficiency
Submission Number: 1108
Loading