Free-MoE: Tuning-Free Mixture-of-Experts Purifying LLMs to Thrive across Any Field

Yuchen Xian; Yixuan Han; Fan Ma; Yi Yang

Free-MoE: Tuning-Free Mixture-of-Experts Purifying LLMs to Thrive across Any Field

Yuchen Xian, Yixuan Han, Fan Ma, Yi Yang

13 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixture of Experts, Pretrained LLMs

Abstract: The Mixture-of-Experts (MoE) framework efficiently scales large language models (LLMs) by selectively activating expert subnetworks, reducing computational costs. However, current MoE methods are costly in computation and include additional expert modules that require extra training data for tuning, leading to instability in the optimization process. To address these issues, we introduce Free-MoE, a tuning-free MoE method that leverages pre-trained LLMs' inherent ability to generalize across a wide range of tasks and domains. Free-MoE dynamically activates experts based on specific domains, achieves improvements while 1) requiring no extra model parameters and 2) being completely tuning-free. Specifically, we design the DOWP Alg., a Domain-Oriented Weight Purification Algorithm that purifies the weights in hidden layers and selects the optimal domain-specific experts of domain-specific experts in the hidden layers of the LLM to optimize activation decisions. The activated DSS-Experts, Domain-Specific Subnetwork Experts,can thereby concentrate on specialized task generation, outperforming the corresponding original model. Moreover, Free-MoE incorporates a multi-level trainable router that activates only the most relevant subnetworks during task, effectively minimizing unnecessary inference computations. Comprehensive evaluations reveals that the DOWP Algorithm consistently achieves general performance gains of 2% to 3%, reaching up to 6.8% across datasets like MMLU, HumanEval, GSM8K, and etc. Additionally, when integrated into \model~framework, our method demonstrates a cumulative improvement of 1.11% in average. Findings indicate that Free-MoE not only enhances overall computational efficiency but improves the model’s adaptability across any field that encompassed in contemporary language generation model benchmarks, and can be seamlessly applied to any transformer-based LLMs. Code for this project will be released in reachable future.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 458

Loading