TAME the BALROG: Task-Adaptive Modular Evolution framework for Game Agents

Published: 27 May 2026, Last Modified: 27 May 2026CompLearn 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agentic models, evolution, modularity, games
TL;DR: We introduce an evolutionary framework that adapts LLM agents to 6 diverse games and achieves state-of-art performance on the BALROG leaderboard.
Abstract: Interactive games have proven to be key benchmarks for advancing Artificial Intelligence (AI), requiring capabilities like long-term planning, exploration, and adaptation to stochastic environments. While Large Language Models (LLMs) have achieved notable results across many domains, they struggle in complex gaming environments like those presented in the BALROG leaderboard. The absence of adaptive frameworks that can dynamically configure themselves based on environmental characteristics, limits the progress of AI in games. To this end, we introduce Task-Adaptive Modular Evolution (TAME) framework, which employs genetic algorithms to evolve environment-specific structures from modular components, enabling significant performance improvements of LLMs across diverse domains. TAME discovers high-performing configurations by selecting between baseline and hierarchical structures, selectively incorporating specialised modules, and fine-tuning each component through systematic mutations. Evaluating TAME across 6 varying games, TAME discovers high-performing architectures that deliver substantial gains: Gemini-3.1-Flash-Lite improves on average from 34.5% to 42.8%, GPT5-Minimal-Thinking rises from 32.8% to 44.8%, as well as Claude-Haiku-4.5 from 27.4% to 35.0%. Moreover, these structures demonstrate strong transferability for larger models of the same family. Transferring TAME-discovered architectures to Gemini-3.1-Pro, we achieve state-of-art performance on the BALROG leaderboard.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 75
Loading