Learn to Follow: Lifelong Multi-agent Pathfinding with Decentralized Replanning

Alexey Skrynnik; Anton Andreychuk; Maria Nesterova; Konstantin Yakovlev; Aleksandr Panov

Learn to Follow: Lifelong Multi-agent Pathfinding with Decentralized Replanning

Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, Konstantin Yakovlev, Aleksandr Panov

Published: 09 Jun 2023, Last Modified: 18 Aug 2023PRLEveryoneRevisionsBibTeX

Keywords: Multi-agent Pathfinding, Reinforcement learning, Heuristic Search

TL;DR: We propose a hybrid algorithm that involves both learnable and non-learnable components tailored to solve lifelong multi-agent pathfinding in a decentralized fashion (and it outperforms state-of-the-art).

Abstract: Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph. In conventional MAPF scenarios, the graph and the agents' start and goal locations are known in advance. Thus, a centralized planning algorithm can be utilized to generate a solution. In this work, we investigate the decentralized MAPF setting, in which the agents can not share the information and must independently navigate toward their goals without knowing the other agents' goals or paths. We focus on the lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning (RL) through policy optimization. Planning is utilized to maintain an individual path, while RL is employed to discover the collision avoidance policies that effectively guide an agent along the path. This decomposition and intrinsic motivation specific for multi-agent scenarios allows leveraging replanning with learnable policies. We evaluate our method on a wide range of setups and compare it to the state-of-the-art competitors (both learnable and search-based). The results show that our method consistently outperforms the competitors in challenging setups when the number of agents is high.

Submission Number: 22

Loading