PokéChamp: an Expert-level Minimax Language Agent for Competitive Pokémon

Seth Karten; Andy Luu Nguyen; Chi Jin

PokéChamp: an Expert-level Minimax Language Agent for Competitive Pokémon

Seth Karten, Andy Luu Nguyen, Chi Jin

Published: 30 Oct 2024, Last Modified: 13 Dec 2024LanGame PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multiagent, LLM agents, competitive games, game theory, reinforcement learning

Abstract: We introduce \texttt{Pok\'eChamp}, a Large Language Model (LLM) powered game-theoretic aware agent for two-player competitive Pok\'emon battles, that uses an LLM prior and collected high-Elo human data to model minimax search without any additional neural network training. \texttt{Pok\'eChamp} uses a depth-limited minimax search online where the LLM replaces three key components: 1) action sampling from the LLM guided by prompts (including from a damage calculation tool), 2) opponent-modeling via the historical likelihood of actions from our dataset to model the effect of LLM-predicted opponent actions, and 3) state value calculation for the LLM to reflect on each intrinsic state. \texttt{Pok\'eChamp} outperforms all existing LLM-based (76\%) and rule-based bots (84\%) by an enormous margin, including winning consistently (64\%) against prior human-parity work run with a frontier model, GPT 4-o, while using an open-source 8 billion parameter Llama 3.1 model. \texttt{Pok\'eChamp} achieves expert performance in the top 10\% of players on the online ladder against competitive human players at an Elo of 1500. Finally, we collect the largest Pok\'emon battling dataset, including 1 million+ games with 150k+ high Elo games, prepare a series of battling benchmarks based on real player data and puzzles to analyze specific battling abilities, and provide crucial updates to the local game engine. Our code is available \href{https://sites.google.com/view/pokechamp-llm}{online}.

Submission Number: 20

Loading