Keywords: LLMs, games, search, self-improvement, self-play, RL, agent, multi-agent
Abstract: Traditional reinforcement learning (RL) typically requires vast amounts of training
data to develop effective policies. In contrast, large language models (LLMs)
exhibit strong generalization and zero-shot capabilities, but struggle with plan-
ning and understanding complex action policies. In this work, we introduce
STRATEGIST, a novel approach that integrates the strengths of both methods. Our
approach leverages LLMs to learn high-level strategic abstractions, which are
then refined and executed by a low-level mechanism, such as Monte Carlo Tree
Search (MCTS). STRATEGIST is a generalizable framework that can be trained
through population-based self-play simulations and self-improvement, without the
need for prior training data. We demonstrate the effectiveness of STRATEGIST in
learning optimal policies for competitive, multi-turn games with partial informa-
tion, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity
discussion games like The Resistance: Avalon. Our results show that agents trained
with STRATEGIST outperform those trained with traditional RL methods, other
LLM-based skill acquisition techniques, and pre-existing LLM agents across both
game environments.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12980
Loading