I Know It’s Unfair, Do It Anyway: LLM Agents Exploiting Explicitly Unfair Tools for Voluntary Collusion in Strategic Games

ICLR 2026 Conference Submission13247 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI Safety, AI Ethics, Multi-Agent Interactions, Agent Coordination, Collusion, Strategic Games, Game Theory
Abstract: The proliferation of Large Language Model-based multi-agent systems (LLM-MAS) creates unprecedented opportunities for human-AI collaboration. However, improving the coordination abilities of LLM agents poses the risk of them discovering and pursuing collusive strategies that harm other agents and human users. To demonstrate this concern, we develop an exploratory framework combining two strategic multi-agent games: Liar's Bar, a competitive deception game, and Cleanup, a mixed-motive resource management game, in which agents are given access to secret collusion tools that provide significant advantages but are explicitly described as unfairly disadvantaging others. Within this framework, we reveal that some claim-to-be-safe LLMs (e.g., Mistral-7b, LLaMA-3-8b) always voluntarily exploit these tools to collude. To our knowledge, this work represents the first systematic investigation of voluntary collusion adoption in LLM-MAS. Our findings provide initial evidence about the conditions under which agents willingly engage in harmful secret collusion for strategic advantage, despite recognizing its unfairness.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 13247
Loading