Rational Irrationality: Evaluating LLMs In Games With Strategic Behavior Discrepancies

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: llm, game theory, alignment, rock paper scissors, centipede game, traveler's dilemma
Abstract: Large language models (LLMs) are increasingly deployed in complex decision-making environments. Consequently, evaluating their strategic reasoning abilities is becoming increasingly important. A growing body of research investigates their performance in multi-objective settings, often based on or inspired by game theory, with evaluation typically focusing on the models' ability to align with theoretical expectations. This paper shifts the focus to evaluating the alignment between LLM behavior and human strategic thinking by analyzing LLM responses in a well-established game theory testbed. We revisit three notable games--- Rock, Paper, Scissors (RPS), the Centipede Game (CG), and the Traveler’s Dilemma (TD), all of which are characterized by substantial discrepancies between empirical human behavior and theoretical predictions. For each game, we record the choices made by LLM agents and compare them with historical data from human subject experiments to uncover commonalities and particularities in their underlying strategic reasoning patterns. Our results indicate that LLMs are, in general, more aligned with game-theoretical expectations and show limited sensitivity to game hyperparameters. In RPS, most LLMs imitate rational behavior, but perform sub-optimally. In CG, likewise, LLMs adopt rational strategies, learning from past interactions. Finally, in TD they cooperate toward a better payoff, adopting, however, a more prudent strategy plan than humans.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 10818
Loading