Towards Predictive Models of Strategic Behaviour in Large Language Model Agents
Keywords: Large Language Models, Game Theory, Strategic Decision Making, Multi-Agent Systems, Mechanism Design, Behavioural Economics, AI Safety, Cooperation
TL;DR: We ran 200,000+ decisions across seven frontier LLMs: self-recognition turns out to be policy correlation, the same "rationality" prompt splits families in opposite directions, and 5-10 scenarios predict R²=0.51 on average.
Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents in settings involving cooperation, competition, and coordination, yet current behavioural evaluations provide limited guidance for anticipating risks in deployment. We present a large-scale study of strategic decision-making across seven frontier models, analysing over 200,000 decisions in game-theoretic scenarios. Using controlled experiments, we found that apparent self-recognition effects operate through inferred policy correlation rather than identity; a correlated stranger elicits cooperation equivalent to a correlated self. We further observe substantial heterogeneity across model families, including opposite responses to identical ``rationality'' instructions, which one might use to steer agent behaviour, and marked differences in forgiveness and exploitation dynamics in iterated interactions. Finally, we introduce a lightweight prediction method that requires only 5-10 calibration scenarios and achieves $R^2 = 0.51$ on average (up to $R^2 = 0.70$) when forecasting held-out model behaviour. These results demonstrate that systematic behavioural evaluation of LLMs can support pre-deployment risk assessment and shed light on AI agent decision-making in strategic situations.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 300
Loading