Causal Discovery and Inference through Next-Token Prediction

Published: 30 Sept 2025, Last Modified: 20 Nov 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal interventions, Probing, Foundational work
Other Keywords: causal inference, causal discovery, transformers, large language models, language modeling, mechanistic interpretability, structural causal models, neural probing, decoding, causal intervention, next-token prediction, counterfactual reasoning, causal representation learning, Pearl's causal hierarchy, ladder of causation, emergent representations, world models
TL;DR: Deep neural networks can discover causal models and learn causal inference through next-token prediction
Abstract: Deep neural networks have been criticized as fundamentally *statistical* systems that fail to capture causal structure and perform causal reasoning. Here we demonstrate that a GPT-style transformer trained for next-token prediction can simultaneously discover instances of linear Gaussian structural causal models (SCMs) and learn to answer counterfactual queries about those SCMs. First, we show that the network generalizes to counterfactual queries about SCMs for which it has seen interventional data but not any examples of counterfactual inference. The network must, thus, have successfully composed discovered causal structures with a learned counterfactual inference algorithm. Second, we decode the implicit “mental” SCM from the network's residual stream activations and manipulate it using gradient descent with predictable effects on the network's output. Our results suggest that statistical prediction may be sufficient to drive the emergence of internal causal models and causal inference capacities in deep neural networks.
Submission Number: 239
Loading