TAME the BALROG: Task-Adaptative Modular Emergent framework for Game Agents

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: open-endedness, agents, LLMs, framework, evolution, games, hierarchy
TL;DR: We introduce an emergent framework that adapts LLM agents to games and achieves state-of-art performance on the BALROG benchmark.
Abstract: Interactive games have proven to be key benchmarks for advancing Artificial Intelligence (AI), requiring capabilities like long-term planning, exploration, and adaptation to stochastic environments. While Large Language Models (LLMs) have achieved notable results across many domains, they struggle in complex gaming environments like those in the BALROG benchmark. The absence of adaptive frameworks that can dynamically configure themselves based on environmental characteristics, limits the progress of AI in games. To this end, we introduce the Task-Adaptive Modular Emergence (TAME) framework, which employs genetic algorithms to evolve environment-specific structures from modular components, enabling significant performance improvements of LLMs across diverse domains. TAME discovers high-performing configurations by selecting between baseline and hierarchical structures, selectively incorporating specialised modules, and fine-tuning each component through systematic mutations. Evaluating TAME across the BALROG benchmark, TAME discovers high-performing architectures that deliver substantial gains: Gemini-2.0-Flash improves from 27.16\% to 35.05\%, while GPT4.1-nano rises from 9.91\% to 17.20\%. Moreover, these structures demonstrate good transferability for larger models of the same family. Transfering these architectures to Gemini-2.5-Pro, we achieve new state-of-art performance on BALROG.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17761
Loading