Setting the DC: Tool-Grounded D&D Simulations to Test LLM Agents

Ziyi Zeng; Shengqi Li; Jiajun Xi; Andrew Zhu; Prithviraj Ammanabrolu

Setting the DC: Tool-Grounded D&D Simulations to Test LLM Agents

Ziyi Zeng, Shengqi Li, Jiajun Xi, Andrew Zhu, Prithviraj Ammanabrolu

Published: 24 Sept 2025, Last Modified: 07 Nov 2025NeurIPS 2025 Workshop GenProCCEveryoneRevisionsBibTeXCC BY 4.0

Track: Regular paper

Keywords: Dungeons and Dragons, LLM, multi-agent

Abstract: Dungeons and Dragons (D&D) has been considered to be an intellectually challenging game for strategy planning and role-playing. Large language models (LLMs) are increasingly deployed as autonomous or semi-autonomous agents, yet most evaluations still target single-turn QA or short-horizon tasks. Assessing agentic performance in rules-constrained, multi-step settings is challenging because style-conforming narration can diverge from task optimality. In this work, we present D&D Agents, a multi-agent Dungeons \& Dragons simulator that can automatically simulate D&D combat, in which LLMs assume the roles of referee ('Dungeon Master', DM), players, and adversarial monsters in tactically rich combat. The combat stresses long-horizon planning, reliable tool use, rule compliance, mixed cooperative–adversarial dynamics, and grounded action via function calls for movement and positioning, attacks (including spellcasting), line of sight (LoS) checks, and resource management. We evaluate transcripts and tool traces along six axes—Function Usage, Parameter Fidelity, Acting Quality, Tactical Optimality, State Tracking, and Function Efficiency—capturing both capability and reliability in closed-loop play.

Submission Number: 31

Loading