A Novel Variable Step-size Path Planning Framework with Step-Consistent Markov Decision Process For Large-Scale UAV Swarm

Dan Xu; Yunxiao Guo; Han Long; Chang Wang

A Novel Variable Step-size Path Planning Framework with Step-Consistent Markov Decision Process For Large-Scale UAV Swarm

Dan Xu, Yunxiao Guo, Han Long, Chang Wang

Published: 01 Jan 2024, Last Modified: 24 May 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, Deep Reinforcement Learning (DRL) has been a key approach to solving Unmanned Aerial Vehicle (UAV) swarm path planning problems. However, traditional DRL methods often face challenges in the initial learning stage and struggle to learn from variable step-size tasks. This paper introduces a novel training framework for large-scale UAV swarm variable step-size path planning: Rapidly-exploring Variable Step-size Deep Reinforcement Learning (RVSDRL). This framework involves common training on the ground local server and decentralized training on distributed UAVs. In the common training stage, we generate rapidly-exploring random graph samples to accelerate the common agent explore environment. In the decentralized training stages, we utilize the priority replay mechanism to improve efficiency. To enhance convergence stability, we restrict the returns of the equivalent paths and propose the Step-size Consistent Markov Decision Process (SCMDP) path planning model. Our method is compared with traditional methods, and the experiments demonstrate its superior performance in complex obstacle environments.

Loading