TAME the BALROG: Task-Adaptative Modular Emergent framework for Game Agents

TAME the BALROG: Task-Adaptative Modular Emergent framework for Game Agents

ICLR 2026 Conference Submission17761 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: open-endedness, agents, LLMs, framework, evolution, games, hierarchy

TL;DR: We introduce an emergent framework that adapts LLM agents to games and achieves state-of-art performance on the BALROG benchmark.

Abstract: Interactive games have proven to be key benchmarks for advancing artificial intelligence, requiring capabilities like long-term planning, exploration, and adaptation to stochastic environments. While Large Language Models (LLMs) have achieved notable results across many domains, they struggle in complex gaming environments like those in the BALROG benchmark. The absence of adaptive frameworks that can dynamically configure themselves based on environmental characteristics, limits the progress of AI in games. To this end, we introduce the Task-Adaptive Modular Emergence (TAME) framework, which employs genetic algorithms to evolve environment-specific structures from modular components, enabling significant performance improvements of LLMs across diverse domains. TAME discovers high-performing configurations by selecting between baseline and hierarchical structures, selectively incorporating specialised modules, and fine-tuning each component through systematic mutations. Evaluating TAME across the BALROG benchmark, we find that the emergent modular structures discovered by TAME significantly enhance LLM performance, raising average progression scores of Gemini 2.0-Flash from 27.15\% to 34.77\%. Moreover, these structures demonstrate transferability across models. Directly employing TAME discovered structures for Gemini-2.0-Flash to a population of Gemini-2.5-Pro, we achieve new state-of-art performance on BALROG. This transferability suggests that TAME identifies fundamental structural principles for game-playing agents that adapt their cognitive architecture to match task demands.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 17761

Loading