When Agents Lie: Premeditation, Persistence, and Exploitation in Repeated Games

Jerick Shi; Terry Jingchen Zhang; Bernhard Schölkopf; Vincent Conitzer; Zhijing Jin

When Agents Lie: Premeditation, Persistence, and Exploitation in Repeated Games

Jerick Shi, Terry Jingchen Zhang, Bernhard Schölkopf, Vincent Conitzer, Zhijing Jin

Published: 07 Jun 2026, Last Modified: 07 Jun 2026ICML 2026 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent LLM systems, strategic deception, repeated games, cheap talk, premeditation, model heterogeneity

TL;DR: LLM agents break public commitments predominantly as planned, and mixing models from different providers produces persistent exploitation because they disagree on whether announcements are binding.

Abstract: As large language models are deployed as autonomous agents that communicate intentions before acting, a critical safety question is whether agents that publicly commit to actions will honor those commitments. We place LLM agents in repeated $n$-player games with a three-stage protocol that separates private intent, public announcement, and final action, allowing us to identify whether each deviation from a stated announcement was already planned during private deliberation. Evaluating three frontier models across six games in both homogeneous and heterogeneous group compositions over 10 rounds, we report two main findings. First, when agents deviate from their public announcements, the deviation is predominantly already stated in their private plan, exceeding 90\% in the highest-deception conditions (96\%+ in Diners and El Farol for GPT-5.2 and Llama-4-Maverick), yet this rate is not a fixed model property: the same model ranges from perfect honesty to near-total deviation across games. Second, in heterogeneous groups, different models interpret public announcements incompatibly, with some treating them as binding coordination signals while others treat them as cheap talk; this mismatch produces systematic payoff gaps of up to 5.00 points that emerge in Round~0 and persist across all 10 rounds, and the gap does not narrow when the minority model speaks last. These findings suggest that multi-LLM deployments cannot rely on shared assumptions about announcement semantics and require empirical testing of actual model interactions before deployment.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Paper Type: Standard paper

Submission Number: 47

Loading