Evaluating the Causal Effect of Chain-of-Thought on Groundedness in Tool-Use Agents via Counterfactual Mutations
Keywords: Chain-of-Thought (CoT), Groundedness / Faithfulness, Retrieval-Augmented Generation (RAG), Counterfactual Evaluation / Causal Inference, Hallucination Mitigation
Abstract: Reasoning-style language models improve tool-use agents, yet their visible chain-of-thought (CoT) may not always be faithful; steps can be decorative or even misleading while the final answer remains unchanged. We therefore target the causal effect of visible CoT on answer groundedness, i.e., whether articulating specific reasoning steps changes the probability that the final answer is supported by the fixed evidence, relative to answer-only or counterfactually edited CoT. This matters for systems that train from traces (distillation/SFT), deploy safety monitors that audit or shape rationales, and run tool agents that must follow an explicit plan: knowing whether the trace content causally improves groundedness tells us if visible traces are a reliable steering signal.
Submission Number: 374
Loading