Models with a Cause: Causal Discovery with Language Models on Temporally Ordered Text Data

TMLR Paper6628 Authors

24 Nov 2025 (modified: 04 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While language models (LMs) have been proposed for causal discovery tasks, it remains unclear whether they possess the inductive biases necessary to identify causal structures in token generation processes. We investigate whether LMs can learn the causal structure governing how tokens depend on their predecessors by testing if they possess the temporal and statistical properties required for causal discovery. We prove that existing algorithms can recover a unique causal model when token sequences satisfy standard causal assumptions and have temporal ordering. LMs' sequential processing and positional encodings enable them to leverage this temporal information. Using controlled experiments on synthetic data generated by mixtures of Markov chains, we test whether LMs learn conditional independencies and Markov exchangeability properties necessary for causal discovery. We find that transformers successfully learn these properties, achieving this not by approximating exact probability distributions but by learning qualitative probability rankings. These synthetic experiments provide initial evidence that LMs possess inductive biases suitable for discovering token-level causal structures.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Krikamol_Muandet1
Submission Number: 6628
Loading