Solving Multi-agent Path Finding as an LLM Benchmark: How, How Good and Why

Weizhe Chen; Sven Koenig; Bistra Dilkina

Solving Multi-agent Path Finding as an LLM Benchmark: How, How Good and Why

Weizhe Chen, Sven Koenig, Bistra Dilkina

Published: 14 Jun 2025, Last Modified: 14 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid success of large language models (LLMs) has spurred extensive research into their ability to solve a wide range of tasks. However, their potential in multi-agent planning remains underexplored. Multi-agent planning presents unique challenges due to the combined complexity of coordination and long-horizon reasoning, often making it difficult to leverage external tools for assistance. In this paper, we introduce Multi-Agent Path Finding (MAPF), also known as multi-robot route planning, as a novel benchmark for evaluating the reasoning capabilities of LLMs. We first describe how the MAPF benchmark can be adapted for LLM-based evaluation, including dataset curation and an agentic workflow for LLMs. We show the motivating success of single-agent planning and multi-agent pathfinding in an empty room map without obstacles, then the failure to plan on the harder room map and maze map of the standard MAPF benchmark. We present our position on why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis. Based on our results, we discussed how researchers with different backgrounds could help with this problem from different perspectives.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Jiang_Bian1

Submission Number: 4208

Loading