Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

ICLR 2025 Conference Submission12980 Authors

28 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, games, search, self-improvement, self-play, RL, agent, multi-agent
Abstract: Traditional reinforcement learning (RL) typically requires vast amounts of training data to develop effective policies. In contrast, large language models (LLMs) exhibit strong generalization and zero-shot capabilities, but struggle with plan- ning and understanding complex action policies. In this work, we introduce STRATEGIST, a novel approach that integrates the strengths of both methods. Our approach leverages LLMs to learn high-level strategic abstractions, which are then refined and executed by a low-level mechanism, such as Monte Carlo Tree Search (MCTS). STRATEGIST is a generalizable framework that can be trained through population-based self-play simulations and self-improvement, without the need for prior training data. We demonstrate the effectiveness of STRATEGIST in learning optimal policies for competitive, multi-turn games with partial informa- tion, including Game of Pure Strategy (GOPS) and multi-agent, hidden-identity discussion games like The Resistance: Avalon. Our results show that agents trained with STRATEGIST outperform those trained with traditional RL methods, other LLM-based skill acquisition techniques, and pre-existing LLM agents across both game environments.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12980
Loading