Keywords: MoE, Mixture-of-Expert, PEFT, LoRA, LLM
TL;DR: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
Abstract: Mixture-of-Experts (MoE) architectures have emerged as a scalable backbone for large language models (LLMs), but their adaptation to downstream tasks remains inefficient due to redundant experts and excessive parameter counts. Parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) reduce training costs, yet they fail to leverage the dynamic routing signals that are intrinsic to MoE. We introduce EPnG, an adaptive expert prune-and-grow framework for parameter-efficient MoE fine-tuning. EPnG computes expert importance scores during training to identify under-utilized experts for pruning, while reinforcing high-importance experts by expanding their LoRA ranks with orthogonalized initialization. This adaptive loop reallocates limited trainable parameters to the most impactful experts without increasing the overall budget. On OLMoE and Qwen1.5-MoE, EPnG surpasses LoRA under the same parameter budget (+2.1\%p and +1.4\%p, respectively) on math and code benchmarks, while achieving performance comparable to full fine-tuning with only 0.5–0.7\%p of parameters ($\approx$ 150× fewer). These results underscore the effectiveness of coupling MoE’s conditional computation with adaptive PEFT for scalable fine-tuning.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 5783
Loading