What Game-Theoretic Benchmarks Miss: Strategic Silence in Multi-Agent LLMs

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent LLM systems, emergent deception, strategic silence, evaluation protocols, AI safety
TL;DR: LLM agents deceive even when aligned, and competitive goals shift deception from impulsive fabrication to strategic silence that message-level evaluation misses.
Abstract: Prior game-theoretic evaluations of LLM deception report overwhelming premeditation: private plans reveal intent to deceive before public commitments. We test whether this reflects LLM agents or the evaluation protocol. We place agents in a multi-agent resource-gathering simulation with narrative goals, free-form communication, and no prompts referencing deception or strategy. Across three goal compositions under GPT-5.4 (6{,}000 agent-round observations), deception emerges even among fully aligned agents (28.65\% of messages), aligned agents deceive at higher rates than both competitive (21.65\%) and mixed (19.31\%) agents, with the per-opportunity gap substantially larger than per-message (Cohen's $d = 2.22$ versus $0.65$ for aligned vs.\ competitive) because competitive agents withdraw from communication and hoard resources. Fabrication rises 2--7$\times$ from early to late rounds, tracking settlement depletion. Goal composition determines whether deception is impulsive or strategic: aligned agents show a 1:4.23 ratio of premeditated-to-impulsive rounds, while competitive agents invert this at 4.01:1. Premeditation takes the form of strategic silence, not planned fabrication: 61--93\% of premeditated rounds involve no deceptive messages, with planning manifested as withholding. When silence is available as a strategic option, planned deception migrates from false commitments to selective withholding, which message-level classification cannot observe. Evaluation protocols shape which deceptive behaviors become measurable.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 138
Loading