Keywords: Large Language Models, Mixture of Experts
Abstract: The Mixture of Experts (MoE) architecture has emerged as a promising solution to reduce computational overhead by selectively activating subsets of model parameters.
The effectiveness of MoE models depends primarily on their routing mechanisms, with the widely adopted Top-K routing scheme used for activating experts.
However, the Top-K scheme has notable limitations,
including unnecessary activations and underutilization of experts.
In this work,
rather than modifying the routing mechanism as done in previous studies,
we propose the Ternary Choice MoE (TC-MoE),
a novel approach that expands the expert space by applying the ternary set {-1, 0, 1} to each expert.
This expansion allows more efficient and effective expert activations without incurring significant computational costs.
Additionally,
given the unique characteristics of the expanded expert space,
we introduce a new load balance loss and reward loss to ensure workload balance and achieve a flexible trade-off between effectiveness and efficiency.
Extensive experiments demonstrate that TC-MoE achieves an average improvement of over 1.1% compared with traditional approaches,
while reducing the average number of activated experts by up to 9%.
These results confirm that TC-MoE effectively addresses the inefficiencies of conventional routing schemes,
offering a more efficient and scalable solution for MoE-based large language models.
Code and models are available at https://github.com/stiger1000/TC-MoE.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9643
Loading