Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose an iterative framework named Latent Space Policy Optimization (LSPO) that combines game-theoretic methods with LLM fine-tuning to build strategic language agent for the Werewolf game.
Abstract: Large language model (LLM) agents have recently demonstrated impressive capabilities in various domains like open-ended conversation and multi-step decision-making. However, it remains challenging for these agents to solve strategic language games, such as Werewolf, which demand both strategic decision-making and free-form language interactions. Existing LLM agents often suffer from intrinsic bias in their action distributions and limited exploration of the unbounded text action space, resulting in suboptimal performance. To address these challenges, we propose Latent Space Policy Optimization (LSPO), an iterative framework that combines game-theoretic methods with LLM fine-tuning to build strategic language agents. LSPO leverages the observation that while the language space is combinatorially large, the underlying strategy space is relatively compact. We first map free-form utterances into a finite latent strategy space, yielding an abstracted extensive-form game. Then we apply game-theoretic methods like Counterfactual Regret Minimization (CFR) to optimize the policy in the latent space. Finally, we fine-tune the LLM via Direct Preference Optimization (DPO) to align with the learned policy. By iteratively alternating between these steps, our LSPO agents progressively enhance both strategic reasoning and language communication. Experiment on the Werewolf game shows that our agents iteratively expand the strategy space with improving performance and outperform existing Werewolf agents, underscoring their effectiveness in free-form language games with strategic interactions.
Lay Summary: Many social deduction games, like Werewolf, hinge on strategic conversation: players must bluff, persuade, and out-reason each other. Current LLM agents struggle in these settings because they show predictable habits (e.g., a "werewolf" agent tends to "kill player 0") and rarely explore bold new strategies, making them easy to outsmart. We address these challenges by introducing a latent strategy space and combining game-theoretic methods with LLM fine-tuning. We first abstract the language game to a structured tree-form game. Then we use game-theoretic methods to optimize policy in the abstracted game. Finally, we align the LLM to the learned policy and expand the strategy space. By iterating between the three steps, our agents explore and learn strong strategies. Our method leads to an agent that bluffs, adapts, and beats existing Werewolf agents. This approach could help build better AI for negotiation, diplomacy, and other language-based strategic interactions.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: LLM agents, strategic language games, game-theoretic methods, multi-agent
Submission Number: 3776
Loading