GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

ACL ARR 2026 January Submission5058 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixture-of-Experts, Low-Rank Adaptation, Self-Rethinking Mechanism, Recurrent Neural Network, Cognitive Depth

Abstract: Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, aiming at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo Graph MoE networks. GRAPHMOE employs a recurrent routing strategy, allowing the model to iteratively revisit and refine intermediate reasoning states. Simultaneously, by treating experts as interconnected nodes in a pseudo-graph, it facilitates iterative information exchange among experts rather than treating them as isolated modules. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art performance.

Paper Type: Long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Research Area Keywords: Language Modeling, Linguistic Theories, Cognitive Modeling, and Psycholinguistics, Machine Learning for NLP

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Theory

Languages Studied: English

Submission Number: 5058

Loading