Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light; Min Cai; Weiqin Chen; Guanzhi Wang; Xiusi Chen; Wei Cheng; Yisong Yue; Ziniu Hu

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

Published: 22 Jan 2025, Last Modified: 02 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, games, search, self-improvement, self-play, RL, agent, multi-agent, planning, decision-making, Monte Carlo Tree Search, hierarchical learning, strategy optimization, reinforcement learning, game AI, strategic reasoning, population-based training, adversarial learning, policy refinement, social deduction games, partially observable environments, heuristic learning, evolutionary strategies.

TL;DR: We introduce STRATEGIST, a bi-level framework that enables LLMs to self-improve decision-making in multi-agent games by iteratively refining high-level strategies through self-play and optimizing execution with Monte Carlo Tree Search (MCTS).

Abstract: Traditional reinforcement learning and planning require a lot of data and training to develop effective strategies. On the other hand, large language models (LLMs) can generalize well and perform tasks without prior training but struggle with complex planning and decision-making. We introduce **STRATEGIST**, a new approach that combines the strengths of both methods. It uses LLMs to generate and update high-level strategies in text form, while a Monte Carlo Tree Search (MCTS) algorithm refines and executes them. STRATEGIST is a general framework that optimizes strategies through self-play simulations without requiring any training data. We test STRATEGIST in competitive, multi-turn games with partial information, such as **Game of Pure Strategy (GOPS)** and **The Resistance: Avalon**, a multi-agent hidden-identity discussion game. Our results show that STRATEGIST-based agents outperform traditional reinforcement learning models, other LLM-based methods, and existing LLM agents while achieving performance levels comparable to human players.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12980

Loading