A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Raanan Yehezkel Rohekar; Yaniv Gurwicz; Sungduk Yu; Estelle Aflalo; Vasudev Lal

A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

Raanan Yehezkel Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We examine whether GPT models are trained only to predict the next token, or do they implicitly learn a world model from which a sequence is generated one token at a time.

Abstract: Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at inference time, can be utilized for zero-shot causal structure learning for input sequences, and introduce a corresponding confidence score. Empirical tests were conducted in controlled environments using the setups of the Othello and Chess strategy games. A GPT, pre-trained on real-world games played with the intention of winning, was tested on out-of-distribution synthetic data consisting of sequences of random legal moves. We find that the GPT model is likely to generate legal next moves for out-of-distribution sequences for which a causal structure is encoded in the attention mechanism with high confidence. In cases where it generates illegal moves, it also fails to capture a causal structure.

Lay Summary: GPT models, like those used in chatbots, are becoming increasingly widespread, with applications expanding beyond natural language understanding into a variety of new domains. But as these models are applied to different fields, a crucial question arises: Can a GPT model---trained simply to predict the next item in a sequence---actually learn the underlying mechanisms of a domain, or is it merely guessing based on patterns in data? To investigate this, we discovered a surprising mathematical connection between the attention mechanism at the heart of GPT models and a framework that scientists use to represent cause-and-effect relationships. Building on this insight, we created a new method to uncover the hidden cause-and-effect structure within a sequence---without needing any extra training or examples. We tested our findings in the controlled environments of the strategy games Chess and Othello and found that the models could internalize the underlying game mechanics, rather than simply mimicking observed sequences. This suggests GPT models may be capable of a deeper understanding than previously thought, paving the way for applying them to uncover the underlying mechanisms in challenging scientific areas such as protein folding, drug design, and the design of new materials.

Link To Code: https://github.com/IntelLabs/causality-lab

Primary Area: General Machine Learning->Causality

Keywords: structural causal model, causal reasoning, GPT, mechanistic interpretability, explainable AI, causal discovery

Submission Number: 6421

Loading