Keywords: Structural causal model, Chain-of-Thought, Causal modeling
Abstract: The integration of Chain-of-Thought into large language models has advanced their reasoning capabilities. However, how CoT produces correct answers through stepwise reasoning—and why it often makes mistakes—remains poorly understood, as the causality between reasoning steps is often difficult to quantify. This limitation raises the open question: Is it necessary to inject causality into CoT reasoning? In this paper, we formalize the CoT as a structural causal model, representing the reasoning process as a causal graph to complete the mathematical modeling. On this basis, we develop a step-level causal correction algorithm, Causalizing Chain-of-Thought (CauCoT), which identifies causally erroneous steps in CoT (i.e., incorrect or unintelligible steps) based on the defined CoT Average Causal Effect, and iteratively updates them until all steps are causally correct—a state we define as relaxed causal correctness. Given the lack of datasets for evaluating the impact of causality on CoT reasoning, we release the Causal Reasoning Benchmark (CRBench), the first benchmark targeting causal errors in CoT, which comprises both causally labeled real CoT reasoning error and newly generated CoT with injected causal errors. Experimental results on LLMs demonstrate that CauCoT can efficiently correct causal errors in CoT and improve the understandability of reasoning. We inject causality into CoT reasoning from mathematical, algorithmic, dataset-driven, and empirical levels, thereby providing strong evidence for the necessity of causality in achieving correct and interpretable stepwise reasoning.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 6033
Loading