Strategema: Probabilistic Analysis of Adversarial Multi-Agent Behavior with LLMs in Social Deduction Games
Keywords: Adversarial, Multi-Agent Behavior, LLM, SDG
Abstract: Social deduction games, like Mafia, are rich testbeds for adversarial multi-agent interactions, featuring deception, coalition formation, and reasoning under uncertainty. While Large Language Models (LLMs) have shown promise in modeling human-like behavior, their use as a laboratory for \textit{quantitative}, \textit{probabilistic} analysis of adversarial strategies remains underexplored. We introduce \textbf{Strategema}, a simulation framework that leverages LLMs to power agents who maintain explicit Bayesian belief models about other players' roles and use them to make informed decisions. Through extensive experiments (400 games across four configurations) varying player counts and adversary ratios, we uncover fundamental patterns in deception, trust dynamics, and strategic convergence. We move beyond descriptive analysis to show that the \textit{trajectory of an agent's belief state} is a powerful predictor of game outcomes. Furthermore, we identify systematic biases in LLM-based reasoning, including confirmation bias that impedes belief updating. Our framework provides a novel paradigm for benchmarking strategic reasoning and offers insights into the mechanics of deception in multi-agent systems, with implications for AI safety and multi-agent interaction research.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 11608
Loading