Token-Complexity based Routing Technique within Mixture of Experts Architecture for Large Language Model

Tasnia Rahman; Sathish A.P. Kumar

Token-Complexity based Routing Technique within Mixture of Experts Architecture for Large Language Model

Tasnia Rahman, Sathish A.P. Kumar

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixture of Experts, Large Language Model, Router, Token Complexity Threshold.

TL;DR: Mixture of Expert Architecture for Large Language Models to enhance scaling and performance

Abstract: Mixture-of-Experts (MoE) architectures have emerged as a powerful technique for improving and scaling Large Language Models by conditionally activating Feed Forward subnetworks and distributing tokens through a routing system, within the Transformer layers. However, existing MoE methods often rely on static top-k routing strategies that do not involve token-level variability in complexity, leading to suboptimal expert utilization. In this research, we propose a novel token-complexity-based routing framework that dynamically allocates tokens to either lightweight or strong feedforward networks (FFNs) based on their estimated token complexity. Our router is trained using a few-shot classification objective to distinguish between easy and complex tokens and surrogate neural network layer. The efficacy of the framework is evaluated while integrating the router with Mistral-7B and Llama-2-7B model. We evaluate our approach on several benchmarks from various fields, and our proposed MoE framework improves accuracy up to 12% compared to the state-of-the-art results using different MoE architecture, with reasonable computational cost.

Primary Area: generative models

Submission Number: 13468

Loading