Tight Error Propagation Bounds for Multi-Step Chain-of-Thought Reasoning

TMLR Paper7164 Authors

26 Jan 2026 (modified: 07 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chain-of-thought (CoT) reasoning enables large language models to solve complex problems, but understanding when these reasoning chains fail remains an open theoretical challenge. While recent work characterizes the computational expressivity of CoT, the fundamental question of reliability—how errors accumulate across steps—lacks rigorous foundations. We develop a Markov chain framework modeling CoT as a stochastic process on reasoning states, enabling formal analysis of error propagation. Our contributions establish: (1) tight bounds proving error probability grows as $1-(1-\varepsilon)^n$ for $n$ steps with per-step error $\varepsilon$; (2) verification overhead characterization showing $k$-redundant verification reduces error to $O(n^{k+1}\varepsilon^{k+1})$; (3) contractive self-correction analysis proving exponential convergence with mixing time $O(\log n/|\log q|)$ when $q < 1$; (4) information-theoretic impossibility results via Fano's inequality; and (5) concentration inequalities via martingale theory. We validate predictions through systematic experiments on synthetic tasks (bounds tight within 5%) and real LLM reasoning on PRM800K, GSM8K, and HumanEval datasets, demonstrating our framework accurately predicts failure rates across domains (mathematical reasoning, code generation). For practitioners: safe chain length is $n \lesssim \delta/\varepsilon$ without verification, while $k$-fold verification extends this to $n \lesssim (\delta/\varepsilon)^{1/(k+1)}$.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ilia_Sucholutsky1
Submission Number: 7164
Loading