When Reading the Chain of Thought Falls Short: A Testbed for Reasoning Trace Analysis

Daria Ivanova; Riya Tyagi; Joshua Engels; Neel Nanda

When Reading the Chain of Thought Falls Short: A Testbed for Reasoning Trace Analysis

Daria Ivanova, Riya Tyagi, Joshua Engels, Neel Nanda

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Methods (probing, steering, causal interventions), Benchmarking Interpretability, Interpretability for AI Safety

Other Keywords: Chain of thought interpretability

TL;DR: We release a CoT-interpretability testbed of nine tasks to stress-test where reading chain of thought breaks down; across thirteen baselines, no method dominates and often going beyond reading CoT wins (activation probes, tool-using agents).

Abstract: Reading the chain of thought (CoT) is a widely used safety technique for reasoning models, but it struggles when the CoT leaves out or misrepresents the factors driving a behavior. However, we lack benchmarks that focus on these cases where reading the CoT fails, so progress on alternative methods is hard to measure. To address this gap, we introduce and release nine novel CoT analysis tasks, each with in-distribution (ID) and out-of-distribution (OOD) test sets. All nine tasks are extremely challenging, with both prompt-optimized frontier LLM monitors and human reviewers frequently achieving no better than chance. We benchmark probes, term frequency methods, LLM monitors, and an LLM agent with interpretability affordances. We focus on OOD performance, since ID results often reflect dataset-specific shortcuts. We find that no method dominates: narrow classifiers, an LLM agent, and LLM monitors all win on different tasks. We provide a lower bound baseline for future work by ensembling all methods with a select-on-ID and score-on-OOD protocol; this ensemble beats the human baseline on 6 / 7 tasks. We believe that our testbed gives future CoT analysis methods a non-saturated hill to climb.

Submission Number: 524

Loading