Abstract: Due to their flexible mobility and stable network connectivity, unmanned aerial vehicles (UAVs) are increasingly being used as mobile data collectors, greatly expanding the spectrum of data collection. However, safe and effective path planning of multiple UAVs in dynamic environments and complex terrains is always challenging: frequent conflicts arise due to dense flight paths, incomplete observations due to dynamic environments, and risk of local optima from limited exploration. Therefore, we propose a UAV path planning approach based on deep reinforcement learning (DRL). Specifically, we employ the multi-agent proximal policy optimization (MAPPO) algorithm to maximize the data collection rate. We first model the multi-UAV path planning problem as a multi-agent partial observable Markov decision process (MA-POMDP) and integrate the traditional proximal policy optimization (PPO) algorithm into a multi-agent learning framework. Then, to improve the training efficiency of the algorithm and the decision-making capability of the UAVs, the strategy of combining centralized training with decentralized execution is used to enable the effective sharing of information and strategies among UAVs. Furthermore, to mitigate the issue of local optimal convergence during strategy learning due to insufficient exploration of various action plans and strategies in the environment, entropy regularization is introduced into the strategy objective function, enabling the agents to learn more comprehensive and effective path planning strategies. Simulation results validate that the algorithm maximizes total system throughput while adhering to constraints on flight duration, information age, and collision avoidance.
Submission Number: 112
Loading