TeamCraft: A Multi-Modal Benchmark for Collaborative Agents in Minecraft

Zhi Li; Qian Long; Ran Gong; Ying Nian Wu; Demetri Terzopoulos; Xiaofeng Gao

TeamCraft: A Multi-Modal Benchmark for Collaborative Agents in Minecraft

Zhi Li, Qian Long, Ran Gong, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao

12 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic, VLA, Multi-Modal

TL;DR: A Multi-Modal Benchmark for Collaborative Agents in Minecraft

Abstract: Collaboration is a cornerstone of society. In the real world, human teammates make use of multi-sensory data to tackle challenging tasks in ever-changing environments. It is likewise essential for embodied agents collaborating in visually-rich environments replete with dynamic interactions to understand multi-modal observations and task specifications. To evaluate the performance of generalizable multi-modal collaborative agents, we present TeamCraft, a multi-modal multi-agent benchmark built on top of the open-world video game Minecraft. The benchmark features 55,000 task variants specified by multi-modal prompts, procedurally-generated expert demonstrations for imitation learning, and carefully designed protocols to evaluate model generalization capability. We also perform extensive analyses to better understand the limitations and strengths of existing approaches. Our results indicate that existing models continue to face significant challenges in generalizing to novel goals, scenes, and unseen numbers of agents. These findings underscore the potential for further research in this area. The TeamCraft platform and dataset are publicly available at https://github.com/teamcraft-bench/teamcraft.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/teamcraft/teamcraft_data

Code URL: https://github.com/teamcraft-bench/teamcraft

Supplementary Material: zip

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 2420

Loading