Keywords: Multi-Agent Systems, Collaborative Planning, Symbolic Actions, Benchmark, Scalable Embodied Environment, Large Language Agents
Abstract: We introduce CUBE (Collaborative Multi-Agent Block-Pushing Environment), a lightweight yet expressive testbed for studying embodied cooperation in multi-agent systems. While traditional agent cooperation benchmarks designed for reinforcement learning emphasize low-level action spaces and scalar rewards, and symbolic planning domains emphasize logical reasoning under deterministic transitions, neither approach alone captures the combination of embodiment, uncertainty, and symbolic structure needed to evaluate emerging embodied LLM-based agents. CUBE addresses this gap by wrapping primitive block-pushing actions into a symbolic action vocabulary, enabling interpretable and compositional cooperation strategies. It also provides a library of symbolic concepts for customized feedback at both per-agent and collective levels. These features allow the same environment to support reinforcement learning and LLM-based agents, as well as hybrid architectures. For ease and fair comparison across experiments, a single parameter $n$ specifies the number of agents, grid size, and block weights, creating a transparent curriculum that scales difficulty and cooperation demands. CUBE thus offers a flexible platform for scalable evaluation of algorithms that integrate symbolic reasoning with embodied multi-agent interaction. The project is open-sourced at: https://happyeureka.github.io/cube.
Archival Option: The authors of this submission do *not* want it to appear in the archival proceedings.
Submission Number: 154
Loading