TAR: Token Adaptive Routing Framework for LLMs Token-level Semantic Correction Inspired by Neuro-Linguistic Pathways

Xinwang Chen; Xiuxing Li; Kai Li; Qing Li; boheng liu; Ziyu Li; Zhuo Wang; Xia Wu

TAR: Token Adaptive Routing Framework for LLMs Token-level Semantic Correction Inspired by Neuro-Linguistic Pathways

Xinwang Chen, Xiuxing Li, Kai Li, Qing Li, boheng liu, Ziyu Li, Zhuo Wang, Xia Wu

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models; math reasoning; brain-inspired; adaptive routing; token semantic correction

TL;DR: We propose a brain-inspired Token Adaptive Routing framework that enables LLMs to self-correct token-level semantic errors, improving reasoning accuracy while reducing inference tokens.

Abstract: Large language models (LLMs) often suffer from cascading errors in math reasoning due to token-level semantic defects. A key limitation is that the reliance on unidirectional feedforward pathways makes LLMs unable to dynamically correct token-level defects during reasoning. In contrast, neuro-linguistic pathways in the human brain—centered on Broca’s and Wernicke’s areas—operate as a closed loop, integrating semantics through feedforward pathways while leveraging feedback circuit for error correction and signal adaptation. The loop involves conflict detection in the anterior cingulate cortex (ACC), cross-regional error transmission via the arcuate fasciculus/IFOF, and compensatory reprocessing in the DLPFC–Broca circuit. Inspired by the functional architecture of neuro-linguistic pathways, we propose a Token Adaptive Routing (TAR) framework that establishes a brain-inspired self-correcting loop in LLMs without requiring parameter fine-tuning. TAR comprises three components: (1) \textbf{Semantic Defect Monitor}, analogous to the anterior cingulate cortex (ACC) for identifying tokens with semantic defects; (2) \textbf{Adaptive Router}, resembling the arcuate fasciculus/IFOF for routing defective tokens to the most compatible LLM functional block; and (3) Feedback-based Re-representation, inspired by the DLPFC–Broca circuit for correcting semantic defects. Experiments show that TAR improves accuracy and reduces the number of inference tokens. On the challenging AIME25 benchmark, TAR improves the accuracy of Qwen3-1.7B by +3.36% while reducing inference tokens by 13.7%. Furthermore, we reveal that maintaining high token confidence is essential for reasoning performance, and deeper blocks in LLMs play a crucial role in shortening reasoning depth. Our code is available at https://anonymous.4open.science/r/warehouse-25F5

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 24412

Loading