Neuronal Redundancy Reduction and Collaboration for Parameter-Efficient Mixture-of-Experts

Neuronal Redundancy Reduction and Collaboration for Parameter-Efficient Mixture-of-Experts

ACL ARR 2025 May Submission6966 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Mixture of Experts (MoE) has great potential in scaling up the capacity of large models while maintaining low computational costs. Recent works have focused on reducing expert-level redundancy by designing various token allocation strategies within gating functions. Whereas, the intricate internal relationships between experts cause knowledge redundancy at the fine-grained neuron level, and research on collaboration among experts remains scarce. In this paper, we propose a Information Bottleneck based MoE (IBMoE) for parameter-efficient fine-tuning, which reduces neuron-level redundancy within each expert and fosters internal collaboration among all experts. Specifically, a sparse neuronal activation strategy is introduced to dynamically activate the relevant neurons while reducing the redundancy when processing different tasks. In addition, a diversity constraint is imposed among experts, which maximizes the knowledge difference to enable all experts cooperative more efficiently. Extensive experiments demonstrate the great advantages of our method. We achieve superior performance while reducing inference time by 63\% and memory consumption by 48.5\% compared to the recent baselines. Our code will be publicly accessible in the future.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: representation learning

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 6966

Loading