CUBE: Collaborative Multi-Agent Block-Pushing Environment for Collective Planning with LLM Agents

Hanqing Yang; Narjes Nourzad; Shiyu Chen; Carlee Joe-Wong

CUBE: Collaborative Multi-Agent Block-Pushing Environment for Collective Planning with LLM Agents

Hanqing Yang, Narjes Nourzad, Shiyu Chen, Carlee Joe-Wong

Published: 28 Sept 2025, Last Modified: 09 Oct 2025SEA @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Systems, Collaborative Planning, Symbolic Actions, Benchmark, Scalable Embodied Environment, Large Language Agents

Abstract: We introduce CUBE (Collaborative Multi-Agent Block-Pushing Environment), a lightweight yet expressive testbed for studying embodied cooperation in multi-agent systems. While traditional reinforcement learning benchmarks emphasize low-level action spaces and scalar rewards, and symbolic planning domains emphasize logical reasoning under deterministic transitions, neither alone provides the combination of embodiment, uncertainty, and symbolic structure needed to evaluate emerging embodied LLM-based agents. CUBE addresses this gap by implementing primitive block-pushing actions that are then wrapped into a symbolic action vocabulary, enabling interpretable and compositional strategies for coordination. In addition, CUBE makes a rich set of symbolic concepts available, supporting the generation of customized feedback at both the per-agent and collective levels. These features allow the same environment to support both reinforcement learning-based agents, which operate on grid-based observations with scalar rewards, and LLM-based agents, which act through symbolic state representations and customized feedback. CUBE is a scalable environment in which the number of agents, grid size, and block distributions can all be varied to adjust task complexity. For ease of comparison across experiments, we introduce a single parameter n that specifies a fixed configuration, with larger values yielding progressively more challenging settings. This design provides a transparent and interpretable curriculum that spans from minimal to large-scale coordination. CUBE thus offers a flexible platform for the scalable evaluation of algorithms that integrate symbolic reasoning with embodied multi-agent interaction. We will release our code upon acceptance.

Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.

Submission Number: 154

Loading