Emergent World Beliefs: Exploring Transformers in Stochastic Games

Adam Kamel; Tanish Rastogi; Michael Ma; Kailash Ranganathan; Kevin Zhu

Emergent World Beliefs: Exploring Transformers in Stochastic Games

Adam Kamel, Tanish Rastogi, Michael Ma, Kailash Ranganathan, Kevin Zhu

Published: 30 Sept 2025, Last Modified: 01 Dec 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0

Open Source Links: https://anonymous.4open.science/r/poker-interp-4653/

Keywords: Probing, Understanding high-level properties of models, Other

Other Keywords: world representation, LLM uncertainty, belief state geometry

TL;DR: LLM world models for stochastic games and partially observable MDPs

Abstract: Transformer-based large language models (LLMs) have demonstrated strong reasoning abilities across diverse fields, from solving programming challenges to competing in strategy-intensive games such as chess. Prior work has shown that LLMs can develop emergent world models in games of perfect information, where internal representations correspond to latent states of the environment. In this paper, we extend this line of investigation to domains of incomplete information, focusing on poker as a canonical partially observable Markov decision process (POMDP). We pretrain a GPT-style model on Poker Hand History (PHH) data and probe its internal activations. Our results demonstrate that the model learns both deterministic structure—such as hand ranks—and stochastic features—such as equity—without explicit instruction. Furthermore, by using primarily nonlinear probes, we demonstrated that these representations are decodeable and correlate with theoretical belief states, suggesting that LLMs are learning their own representation of the stochastic environment of Texas Hold'em Poker.

Submission Number: 250

Loading