Do LLMs Follow Their Self-Reported Causal Graphs? A Graph-Contract Audit of Falsifiable Rationales for Trustworthy Decisions
Keywords: Large Language Models, Causality, Trustworthiness
Abstract: LLMs are increasingly considered for decision-support settings where explanations must support accountability, not merely plausibility. Causal graphs are a natural interface for such explanations: they state which variables should matter, which should not, and which information should be inadmissible. But do LLMs actually follow the causal graphs they report?
We propose a graph-contract audit that treats a self-reported causal graph as a falsifiable behavioral commitment. An LLM first reports a task-level causal graph; later, in fresh prediction prompts without access to that graph, we test whether its decisions respect the graph-implied constraints. Our audit checks whether predictions remain stable under graph-implied irrelevant perturbations and whether models avoid using post-outcome information.
Across admissions and loan-default decision tasks, we find that self-reported causal graphs often generate meaningful, auditable commitments, but model predictions frequently violate them. These violations are not fully explained by predictive accuracy, graph stability, decoding noise, or explicit graph reminders. Our results suggest that causal explanations for LLM decisions should be evaluated as testable contracts. This provides a practical framework for turning causal explanations into accountability tools for trustworthy AI in socially consequential decision workflows.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 304
Loading