Keywords: multiagent, LLM agents, competitive games, game theory, reinforcement learning
Abstract: We introduce \texttt{Pok\'eChamp}, a Large Language Model (LLM) powered game-theoretic aware agent for two-player competitive Pok\'emon battles, that uses an LLM prior and collected high-Elo human data to model minimax search without any additional training. \texttt{Pok\'eChamp} uses a depth-limited minimax search online where the LLM replaces three key components: 1) action sampling from the LLM guided by prompts (including from a damage calculation tool), 2) opponent-modeling via the historical likelihood of actions from our dataset to model the effect of LLM-predicted opponent actions, and 3) state value calculation for the LLM to reflect on each intrinsic state. \texttt{Pok\'eChamp} outperforms all existing AIs (76\%) and heuristic bots (84\%) by an enormous margin, including winning consistently (>50\%) against prior human-parity work run with a frontier model, GPT 4-o, while using an open-source 8 billion parameter Llama 3.1 model. \texttt{Pok\'eChamp} achieves expert performance in the top 10\% of players on the online ladder against competitive human players at an Elo of 1500. Finally, we collect the largest Pok\'emon battling dataset, including 1 million+ games with 150k+ high Elo games, prepare a series of battling benchmarks based on real player data and puzzles to analyze specific battling abilities, and provide crucial updates to the local game engine. Our code is available \href{https://sites.google.com/view/pokechamp-llm}{online}.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13073
Loading