Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We prove lower bounds for the length of chain-of-thoughts on algorithmic problems.
Abstract: Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
Lay Summary: LLMs get substantially better at solving reasoning problems when they are allowed to output intermediate steps -- known as "chain-of-thought reasoning". However, such chains of intermediate steps introduce extra computational cost. We theoretically show that, on various reasoning problems, LLMs likely need to provide a large number of intermediate steps when the input is long. This research matters because it shows fundamental barriers to efficiently solving reasoning tasks with LLMs.
Primary Area: Theory
Keywords: theory, lower bounds, chain-of-thought, transformers
Submission Number: 12429
Loading