Objective Misalignment in LLM-based Multi Agent Social Deception Game

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: Multi-agent Systems, Large Language Model, Social Deception Game, Werewolf Game, Objective Misalignment
TL;DR: In a social deception game with expected competition, a single LLM agent with a hidden misaligned objective can strategically shift communication and role-based influence, substantially disrupting coordination and collective outcomes.
Abstract: Large language model–based multi-agent systems have attracted increasing attention for their strong performance in collaborative tasks and social simulations. However, these interactive settings also introduce vulnerabilities, as a single agent's hidden goals and misaligned behavior can propagate misleading or malicious information throughout the system. In this work, we study these risks in the context of social deception games. We focus on the Werewolf Game, which requires agents to reason, communicate, and collaborate under asymmetric and incomplete information. We modify the individual objectives of some agents to induce benevolent, individualistic, and malevolent strategies that can make agents depart from the objectives of their own team. We evaluate how objective divergence affects game outcomes, collaboration, and goal satisfaction. Misaligned agents often succeed in achieving their own objectives, with effects amplified by role-based power asymmetries. Qualitative analyses further show that agents remain coherent and adaptive, strategically adjusting their reasoning, communication, voting behavior, and influence on group dynamics. These results indicate that risks in LLM-based multi-agent systems extend beyond collaborative task settings and persist even in environments where competition is structurally expected.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 152
Loading