Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

Published: 02 Mar 2026, Last Modified: 18 Mar 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: Chain-of-Thought, Faithfulness, Mechanistic Interpretability, Normalized Logit Difference Decay, NLDD, Reasoning Horizon, Large Language Models, Causal Inference, Counterfactual Analysis
TL;DR: We introduce NLDD to quantify reasoning's causal influence, identifying a "reasoning horizon" where Chain-of-Thought reliance degrades and distinguishes genuine reasoning from post-hoc rationalization.
Abstract: Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or merely post-hoc justifications. We propose \textbf{Normalized Logit Difference Decay (NLDD)}, a metric that measures whether individual reasoning steps are faithful to the model's decision-making process. Our approach corrupts individual reasoning steps from the explanation and measures how much the model's confidence in its answer drops, to determine if a step is truly important. By standardizing these measurements, NLDD enables rigorous cross-model comparison across different architectures. Testing three model families across syntactic, logical, and arithmetic tasks, we discover a consistent \textbf{Reasoning Horizon ($k^*$)} at 70–85\% of chain length, beyond which reasoning tokens have little or negative effect on the final answer. We also find that models can encode correct internal representations while completely failing the task. These results show that accuracy alone does not reveal whether a model actually reasons through its chain. NLDD offers a way to measure when CoT matters.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 27
Loading