Combining Code Generating Large Language Models and Self-Play to Iteratively Refine Strategies in Games

Yoram Bachrach, Edan Toledo, Karen Hambardzumyan, Despoina Magka, Martin Josifoski, Minqi Jiang, Jakob N. Foerster, Roberta Raileanu, Tatiana Shavrina, Nicola Cancedda, Avraham Ruderman, Katie Millican, Andrei Lupu, Rishi Hazra

Published: 01 Jan 2025, Last Modified: 24 Oct 2025IJCAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a self-play approach to generating strategies for playing in multi-player games, where strategies are represented as computer code. We use large language models (LLMs) to generate pieces of code to play in the game, which we refer to as generated bots. We engage the LLM generated bots in competitions, designed to generate increasingly stronger strategies. We follow game theoretic principles in organizing these tournaments, and use a Policy Space Response Oracle (PSRO) approach. We start with an initial set of LLM generated bots, and continue in rounds for adding new bots into the population. Each round adds a bot to the population by asking the LLM to produce code for playing against a bot representing the Nash equilibrium mixture over the current population. Our analysis shows that even a few rounds are sufficient to produces strong bots for playing the game. Our demo shows the process for the game of Checkers. We allow users to select initial bots in the population, run the process, inspect how the bots evolve over time, and play against the generated bots.