MAP-THOR: Benchmarking Long-Horizon Multi-Agent Planning Frameworks in Partially Observable Environments

Published: 18 Jun 2024, Last Modified: 05 Sept 2024MFM-EAI@ICML2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-agent systems, Embodied AI, Task planning, Benchmarks, Partially observable environments, Long-horizon planning, Collaborative robotics, Language models
TL;DR: We propose a benchmark household suite to evaluate multi-agent planning frameworks systematically
Abstract: Evaluating embodied multi-agent planners necessitates robust and versatile benchmarks. We introduce MAP-THOR (Multi-Agent Planning in AI2-THOR), a benchmark specifically designed to assess the performance of embodied multi-agent planning systems in realistic, partially observable environments within the AI2-THOR environment. Existing benchmarks offer extensive environments for single-agent tasks, but fail to capture the complexities inherent in multi-agent interactions, non-stationarity, partial observability and long-horizon planning. Addressing these gaps, MAP-THOR facilitates the development of frameworks that allocate tasks and enable coordination among multiple agents. MAP-THOR introduces a comprehensive suite of household tasks demanding collaboration and adaptation to dynamic environmental changes, mirroring real-world scenarios. Our benchmark includes detailed metrics for success rate, efficiency, and collaborative effectiveness, setting a new standard for evaluating multi-agent planning systems. Through rigorous experiments, we show that MAP-THOR offers a robust evaluation framework for language model (LM)-based multi-agent planning systems. Ultimately, we hope that MAP-THOR serves as a standard benchmark to identify embodied multi-agent planning frameworks that systematically improve generalization for long-horizon partially observable planning.
Submission Number: 26
Loading