Causal World Representation in the GPT Model

Raanan Yehezkel Rohekar; Yaniv Gurwicz; Sungduk Yu; Vasudev Lal

Causal World Representation in the GPT Model

Raanan Yehezkel Rohekar, Yaniv Gurwicz, Sungduk Yu, Vasudev Lal

Published: 10 Oct 2024, Last Modified: 31 Oct 2024CaLM @NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causality, GPT, causal discovery, neural networks

TL;DR: Are GPT models only trained to predict the next token, or do they implicitly learn a causal world model from which a sequence is generated one token at a time?

Abstract: Are generative pre-trained transformer (GPT) models only trained to predict the next token, or do they implicitly learn a world model from which a sequence is generated one token at a time? We examine this question by deriving a causal interpretation of the attention mechanism in GPT, and suggesting a causal world model that arises from this interpretation. Furthermore, we propose that GPT-models, at inference time, can be utilized for zero-shot causal structure learning for in-distribution sequences. Empirical evaluation is conducted in a controlled synthetic environment using the setup and rules of the Othello board game. A GPT, pre-trained on real-world games played with the intention of winning, is tested on synthetic data that only adheres to the game rules. We find that the GPT model is likely to generate moves that adhere to the game rules for sequences for which a causal structure is encoded in the attention mechanism with high confidence. In general, in cases for which the GPT model generates moves that do not adhere to the game rules, it also fails to capture any causal structure.

Submission Number: 42

Loading