Why Do LLMs Fail at Arithmetic Logic? A Diagnosis of Limits on Sequential Computation

Jiaqi Wei; Xiang Zhang; Wenxuan Huang; Wanli Ouyang; Siqi Sun

Why Do LLMs Fail at Arithmetic Logic? A Diagnosis of Limits on Sequential Computation

Jiaqi Wei, Xiang Zhang, Wenxuan Huang, Wanli Ouyang, Siqi Sun

09 Sept 2025 (modified: 02 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, CoT

Abstract: Despite their power as general sequence processors, Transformers systematically fail at \textbf{simple sequential arithmetic tasks like counting}. While Chain-of-Thought (CoT) prompting circumvents the Transformer's architectural limits for such iterative computations, its practical application is plagued by brittleness over long sequences. We propose a new perspective on this failure, identifying an architectural conflict we term \textbf{State-Update Interference (SUI)}. We posit that self-attention's inductive bias for global, semantic association can disrupt the localized, state-dependent updates required by procedural algorithms. Paradoxically, CoT may exacerbate this by unrolling the entire computational history, creating an ever-growing set of distractors that are semantically similar but logically irrelevant, thereby corrupting the state-update process. To investigate this hypothesis, we introduce \textbf{Sequential State Quarantining (SSQ)}, a diagnostic instrument designed to isolate this failure mode. SSQ periodically forces the model to compress its reasoning trace into a compact state while discarding the preceding context, surgically enforcing the narrow information bottleneck required for procedural logic. On a suite of algorithmic tasks, SSQ yields dramatic performance gains, with accuracy scaling monotonically with the frequency of this intervention. Our findings suggest that a primary bottleneck for procedural reasoning is architectural: a failure of \textbf{context management} that is distinct from general limitations of context length or logical capacity. This reframes the problem, suggesting a need for models that can learn to actively manage their long context. Our source code is provided at an anonymous \href{https://anonymous.4open.science/r/Recurrent-CoT-A344}{link.}

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 3285

Loading