Combining Multiagent Reinforcement Learning and Search Method for Drone Delivery on a Non-grid Graph

Shiyao Ding, Hideki Aoyama, Donghui Lin

Published: 01 Jan 2022, Last Modified: 12 Jun 2024PAAMS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the high development of online commerce, drone delivery has shown a potential to reduce logistical costs. Multiple drone delivery can be formulated as a multiagent path finding (MAPF) problem which is used to identify a group of collision-free paths for multiple agents. However, most prior work on MAPF has studied on grid graphs, which is not proper for drone delivery problem. We study here a non-grid MAPF problem for drone delivery. Some algorithms for solving grid MAPF can also be applied to this new problem, which can be categorized into two types: search-based methods and dynamic programming methods. However, the challenges created by non-grid features, such as a large state/action space, impede the application of either of these two methods. We therefore propose a novel approach that combines a search method and a dynamic programming method which can accelerate the learning process. The experimental results show our proposed method to be more effective than some existing algorithms.