Reasoning Under Pressure: LLMs in Competitive Pokémon Battles

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Strategic Reasoning, Pokémon Battles, Multi-agent Systems, Adversarial Environments, Natural Language Rationales, Efficient Reasoning, Tournament Simulation, Interpretability, Risk and Adaptation
TL;DR: We introduce LLM Pokémon League, a benchmark where models battle in Pokémon, revealing differences in strategy, reasoning style, and efficiency in an interpretable, adversarial setting.
Abstract: We introduce LLM Pokémon League, a system that uses competitive Pokémon battles to study how large language models (LLMs) reason and make strategic decisions. In this setup, models from Openai, Anthropic, and Google face each other in tournaments where they must build teams, choose moves, and adapt to uncertain situations. Each action is explained in natural language, allowing us to closely examine how models think, adjust, and plan during the game. Unlike other evaluation methods that require large resources, Pokémon League offers a lightweight and accessible way to see real-time strategy and reasoning. Our experiments show clear differences in how models approach battles, from careful team balance to risk-heavy play styles. By framing reasoning as a competitive game with transparent choices, LLM Pokémon League provides a practical way to compare and understand the strategic abilities of today’s leading models.
Submission Number: 113
Loading