EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning

Ahin Lee; Sehyun Yun; Changmin Lee; Taesik Gong

EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning

Ahin Lee, Sehyun Yun, Changmin Lee, Taesik Gong

15 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MoE, Mixture-of-Expert, PEFT, LoRA, LLM

TL;DR: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning

Abstract: Mixture-of-Experts (MoE) architectures have emerged as a scalable backbone for large language models (LLMs), but their adaptation to downstream tasks remains inefficient due to redundant experts and excessive parameter counts. Parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) reduce training costs, yet they fail to leverage the dynamic routing signals that are intrinsic to MoE. We introduce EPnG, an adaptive expert prune-and-grow framework for parameter-efficient MoE fine-tuning. EPnG computes expert importance scores during training to identify under-utilized experts for pruning, while reinforcing high-importance experts by expanding their LoRA ranks with orthogonalized initialization. This adaptive loop reallocates limited trainable parameters to the most impactful experts without increasing the overall budget. On OLMoE and Qwen1.5-MoE, EPnG surpasses LoRA under the same parameter budget (+2.1\%p and +1.4\%p, respectively) on math and code benchmarks, while achieving performance comparable to full fine-tuning with only 0.5–0.7\%p of parameters ($\approx$ 150× fewer). These results underscore the effectiveness of coupling MoE’s conditional computation with adaptive PEFT for scalable fine-tuning.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 5783

Loading