Test-Time Layer Recurrence Enables Ultra-Deep Thinking in LLMs Without Chain-of-Thought

Xiang Zhang; Jiaqi Wei; Wenxuan Huang; Yuejin Yang; Chenyu You; Wanli Ouyang; Siqi Sun

Test-Time Layer Recurrence Enables Ultra-Deep Thinking in LLMs Without Chain-of-Thought

Xiang Zhang, Jiaqi Wei, Wenxuan Huang, Yuejin Yang, Chenyu You, Wanli Ouyang, Siqi Sun

08 Sept 2025 (modified: 02 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: CoT, LLM

Abstract: Transformers possess a \textbf{neural depth} of only $O(1)$, which restricts them to solving primarily \textbf{inductive} reasoning problems of bounded depth. In contrast, recurrent models allow the latent reasoning state $\mathbf{h}$ to be sequentially updated across arbitrarily many recurrent steps, enabling them to handle tasks that require deep reasoning. Owing to their non-recurrent architecture, Transformer-based large language models (LLMs) struggle on such tasks, performing poorly on domains like chess, multi-digit multiplication, and long-range counting. The emergence of Chain-of-Thought (CoT) reasoning has partially mitigated this limitation by simulating temporal recurrence through latent-to-text-to-latent conversion, thereby granting Transformer LLMs theoretically unbounded neural depth under ideal conditions. However, CoT comes at the cost of very long generation sequences and low time efficiency. Recent work has shown that reasoning depth can also be enhanced in the \emph{vertical} direction by repeating Transformer layers, complementing the \emph{temporal} depth introduced by CoT. These two approaches--horizontal depth extension via CoT and vertical depth extension via layer recurrence--exhibit distinct theoretical and practical properties, yet both hold strong promise for boosting the reasoning capabilities of Transformer-based LLMs. In this paper, we present both theoretical analysis and empirical comparasion pf these two paradigms, and demonstrate how each contributes to enhancing computational power and downstream performance, particularly in ultra-long reasoning scenarios where standard Transformers are most limited.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 2910

Loading