GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
Keywords: Mixture-of-Experts, Low-Rank Adaptation, Self-Rethinking Mechanism, Recurrent Neural Network, Cognitive Depth
Abstract: Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, aiming at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo Graph MoE networks. GRAPHMOE employs a recurrent routing strategy, allowing the model to iteratively revisit and refine intermediate reasoning states. Simultaneously, by treating experts as interconnected nodes in a pseudo-graph, it facilitates iterative information exchange among experts rather than treating them as isolated modules. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art performance.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Language Modeling, Linguistic Theories, Cognitive Modeling, and Psycholinguistics, Machine Learning for NLP
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Theory
Languages Studied: English
Submission Number: 5058
Loading