Observer, Not Player: Simulating Theory of Mind in Large Language Models through Game Observation

Published: 23 Sept 2025, Last Modified: 22 Nov 2025LAWEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: Theory-of-Mind, Large Language Model, Reasoning
Abstract: We present an interactive demo for evaluating whether a large language model (LLM) “understands” a simple yet strategic environment. We used Rock–Paper Scissors (RPS) as an example to demonstrate how models can understand and interact with real-world games under various strategies. Our system lets the LLM act both as an Observer , who aim to produce a predictive distribution over RPS outcomes for a given matchup and to recognize how models "think" during games. Currently, we provide a benchmark including a family of static strategies and lightweight dynamic strategies with well prompted rules for models. We quantify alignment between the Observer’s distribution and the ground-truth distribution induced by three actual aspects using Cross-Entropy, Brier score, Expected Value (EV) discrepancy to fairly evaluate under their average loss result (Union Loss). The demo emphasizes interactivity, transparency, and reproducibility and is able to claim that users can adjust LLM distributions in real time, visualize loss instantly with the ability to inspect failure modes to understand how LLM "thinks" during games and evaluate how the LLM’s strategy goes from time to time. We release implementation details and evaluation scripts for easy reproduction.
Submission Type: Demo Paper (4-9 Pages)
Submission Number: 42
Loading