Replicate and Quantize: A Plug-and-Play Strategy for Load Balancing in Sparse Mixture-of-Experts LLMs

Zijie Liu; Jie Peng; Zirui Liu; Kaixiong Zhou; Tianlong Chen; Zhaozhuo Xu

Replicate and Quantize: A Plug-and-Play Strategy for Load Balancing in Sparse Mixture-of-Experts LLMs

Zijie Liu, Jie Peng, Zirui Liu, Kaixiong Zhou, Tianlong Chen, Zhaozhuo Xu

22 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: mixture-of-experts;load balance

Abstract: While the rapid increase in the number of model parameters poses significant benefits to the development of large language models (LLMs), computational costs are also raised. In order to tackle this difficulty, the sparse mixture-of-experts(SMoE) model was introduced to tackle LLM scaling by activating a subset of experts per input. Therefore, how to leverage the knowledge of multiple experts will be an important topic. Normally, in the most extreme scenario, employing a balanced expert allocation system will result in a time-saving of $n$ times compared to utilizing only a single expert. Thus, in this paper we (1) systematically analyzed the performance and functionality of each expert. (2) Introduced a metric to fill the blank of evaluating load balance for the sparse mixture-of-experts(SMoE) model, based on the observation. (3) Proposed a dynamic plug-and-play strategy that is both trainingless and near-lossless, effectively resolving the load balancing problem, in contrast to previous works that focused on training strategies.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2654

Loading