FREE-MoE: Tuning-Free Mixture-of-Experts Purifying LLMs to Thrive across Any Field

ACL ARR 2025 February Submission8354 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The scaling up of pre-trained large language models entails increasing parameters and associated costs, while reducing parameters generally leads to a decrease in performance. Forming large language models into a Mixture of Experts (MoE) architecture demonstrates promising potential, as it not only reduces the tuning requirements of MoE but also allows for performance improvements. In this work, we introduce FREE-MoE, which leverages the inherent generalization ability of pretrained LLMs across multiple tasks and domains, implementing weight purification to obtain Domain-Specific Subnetwork Experts. This method achieves performance improvements while 1) requiring no additional model parameters, and 2) being completely tuning-free for the experts. Specifically, we design the DOWP algorithm (Domain-Oriented Weight Purification Algorithm), which purifies the irrelevant weights in the hidden layers of the pretrained LLM based on the input domain, forming domain-specific subnetwork experts. Additionally, FREE-MoE incorporates a multi-level trainable router to integrate DOWP into the pretrained LLM, ensuring that only the most relevant subnetworks are activated. Findings show that the FREE-MoE not only improves the model’s adaptability across various domains covered in contemporary language generation model benchmarks, but can also be seamlessly applied to any transformer-based LLM.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 8354
Loading