Decoupling of Experts: A Knowledge-Driven Architecture for Efficient LLMs

Decoupling of Experts: A Knowledge-Driven Architecture for Efficient LLMs

ICLR 2026 Conference Submission13711 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: DoE, knowledge block, expert decoupling

TL;DR: Our DoE architecture uses a two-stage (LDA->VAE) process to create dynamic 'Knowledge Block' experts from attention's K/V matrices. It replaces MoE routers with an attention-based gate (AGC), achieving layer-wise specialization and efficient scaling.

Abstract: Current large language models (LLMs), particularly Mixture-of-Experts (MoE) variants, face challenges in achieving efficient, structured, and interpretable scaling. We introduce the Decoupling of Experts (DoE) architecture, a novel framework that addresses these limitations by grounding computation in a hierarchically organized and dynamically updated knowledge space. Our methodology features a two-stage lifecycle: we first use Latent Dirichlet Allocation (LDA) to build a semantic topic foundation from the training corpus. This knowledge is then integrated into the main LLM, where it is dynamically refined. Critically, we discard traditional, static MoE experts. Instead, the expert entity is a dynamic \textbf{Knowledge Block} synthesized on-the-fly by reusing the Key and Value matrices from the attention computation. We replace the standard load balancer and softmax gating with an \textbf{Attention Gating Control (AGC)} that employs a VAE-based router with a ReLU activation for expert composition. This entire process is optimized with a composite loss function, balancing next-token prediction with a KL-divergence-based expert loss. Our analysis reveals that this architecture induces a remarkable \textbf{heterogeneous specialization} across layers, with some layers differentiating into "science" and "humanities" domains, while others converge on general functions. This demonstrates a learned, hierarchical division of labor, paving the way for a new, more efficient scaling dimension based on the number of structured experts.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13711

Loading