Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

ICLR 2026 Conference Submission19454 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: intermediate structures, faithfulness, intervention, mediation, causality
TL;DR: We test whether large language models base decisions on their generated reasoning by intervening on intermediate structures under and find that LLMs often ignore structural changes, revealing a gap between reasoning and decision-making.
Abstract: Large language models (LLMs) increasingly generate intermediate reasoning structures --- rubrics, checklists, proof graphs --- to make their decisions more interpretable. But are these structures causal mediators of the final answer, or decorative by-products? We introduce a causal evaluation protocol that tests LLM faithfulness via interventions to original prompt or corresponding intermediate structures. Across nine models and four benchmarks with annotated intermediates, the protocol reveals a systematic gap: while models rely on structures more than the original text (>60% consistency under interventions to original prompt), they fail to update under logically significant structural edits more than 50% of the time. Surprisingly, models are more faithful to their self-generated structures than to gold ones, suggesting that the act of generation elicits reasoning more effectively than passive consumption. Our study provides the causal and systematic evidence that current LLMs treat intermediate structures as context rather than true mediators of decision making.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19454
Loading