Abstract: Using multiple unmanned aerial vehicles (UAVs) with backscatter communication to collect data from Internet of Things (IoT) devices has emerged as a promising solution. However, many existing UAVs path planning schemes for data collection suffer from performance degradation due to their limited consideration of the full collaboration of UAVs and dynamic stochastic environments. Therefore, we propose a path planning scheme for the data collection task in multi-UAV IoT based on multi-agent reinforcement learning (MARL) to minimize the task completion time. Due to the inherent asynchronous decision making among the agents, we model the path planning problem as a macro-action decentralized partially observable Markov decision process. Furthermore, we design an action mask mechanism to enhance data efficiency, which accelerates the training speed. Simulation results show that our scheme reduces the average task completion time by 15 %.
Loading