Causal Discovery and Inference through Next-Token Prediction

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: causal inference, causal discovery, transformers, large language models, language modeling, mechanistic interpretability, structural causal models, neural probing, decoding, causal intervention, next-token prediction, counterfactual reasoning, causal representation learning, Pearl's causal hierarchy, ladder of causation, emergent representations, world models
TL;DR: Statistical prediction may be sufficient to drive the emergence of internal causal models and causal inference capacities in deep neural networks.
Abstract: Deep neural networks have been criticized as fundamentally *statistical* systems that fail to capture causal structure and perform causal reasoning. Here we demonstrate that a GPT-style transformer trained for next-token prediction can simultaneously discover instances of linear Gaussian structural causal models (SCMs) and learn to answer counterfactual queries about those SCMs. First, we show that the network generalizes to counterfactual queries about SCMs for which it has seen interventional data but not any examples of counterfactual inference. The network must, thus, have successfully composed discovered causal structures with a learned counterfactual inference algorithm. Second, we decode the implicit “mental” SCM from the network's residual stream activations and manipulate it using gradient descent with predictable effects on the network's output. Our results suggest that statistical prediction may be sufficient to drive the emergence of internal causal models and causal inference capacities in deep neural networks.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 17732
Loading