Characterizing Backtracking in CoT through Internal Probes and Surface-Level Features

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: backtracking, reasoning, llms, cot, chain of thought, probing, interpretability
Abstract: Chain-of-thought (CoT) traces from reasoning models often include revisions of intermediate reasoning steps, a behavior we term backtracking. We explore when and why backtracking occurs in reasoning. Using an automated annotation pipeline, we find that backtracking is rare (3-10\% of reasoning chunks) and highly autocorrelated. We further compare surface-level predictors with linear probes on hidden states to identify features predictive of backtracking. While surface features provide substantial signal (ROC-AUC up to 0.80), hidden-state probes prove superior for both detecting current backtracking and predicting its onset in the next step (TPR@5%FPR up to 0.47). Our results indicate that backtracking reflects a structured internal regime during generation rather than merely superficial linguistic cues.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 102
Loading