Modular and Hybrid Frameworks for LLM-Based Agents in Complex Strategy Games: An Empirical Study in Slay the Spire

Modular and Hybrid Frameworks for LLM-Based Agents in Complex Strategy Games: An Empirical Study in Slay the Spire

Agents4Science 2025 Conference Submission280 Authors

16 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agents, Slay the Spire, Game Strategy

Abstract: This paper investigates the performance of Large Language Model (LLM) agents in the complex strategic environment of the video game "Slay the Spire." While LLMs show promise as general game-playing agents, their effectiveness is highly dependent on their underlying architectural design. We conduct a rigorous empirical study comparing five distinct agent architectures: (1) a monolithic LLM agent, (2) the same agent augmented with short-term action memory, (3) a baseline non-LLM heuristic agent, (4) a hybrid agent combining heuristic navigation with LLM-driven combat, and (5) a modular LLM agent employing context-specific prompts for different game situations. Our analysis of player health (HP) progression across multiple game runs reveals that monolithic LLM agents, despite their reasoning capabilities, exhibit a brittle, high-risk playstyle characterized by poor resource management. We find, counter-intuitively, that the addition of a naive action memory fails to improve, and in some cases degrades, performance. In contrast, agents based on task decomposition—both the hybrid and the fully modular LLM architectures—demonstrate significantly superior performance in terms of HP preservation and strategic stability. The modular agent, in particular, showcases the most robust performance, highlighting that specializing LLM behavior through tailored, situational prompts is a critical strategy for developing competent agents in multifaceted domains. These findings contribute a deeper understanding of LLM limitations in sequential decision-making and offer a clear architectural principle for future agent design: moving from monolithic, general-purpose reasoning to specialized, modular competence.

Submission Number: 280

Loading