Evaluating the Causal Effect of Chain-of-Thought on Groundedness in Tool-Use Agents via Counterfactual Mutations

Kateryna Solonko

Evaluating the Causal Effect of Chain-of-Thought on Groundedness in Tool-Use Agents via Counterfactual Mutations

Kateryna Solonko

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chain-of-Thought (CoT), Groundedness / Faithfulness, Retrieval-Augmented Generation (RAG), Counterfactual Evaluation / Causal Inference, Hallucination Mitigation

Abstract: Reasoning-style language models improve tool-use agents, yet their visible chain-of-thought (CoT) may not always be faithful; steps can be decorative or even misleading while the final answer remains unchanged. We therefore target the causal effect of visible CoT on answer groundedness, i.e., whether articulating specific reasoning steps changes the probability that the final answer is supported by the fixed evidence, relative to answer-only or counterfactually edited CoT. This matters for systems that train from traces (distillation/SFT), deploy safety monitors that audit or shape rationales, and run tool agents that must follow an explicit plan: knowing whether the trace content causally improves groundedness tells us if visible traces are a reliable steering signal.

Submission Number: 374

Loading