DL-DRL: A Double-Level Deep Reinforcement Learning Approach for Large-Scale Task Scheduling of Multi-UAV

Published: 01 Jan 2025, Last Modified: 22 Jul 2025IEEE Trans Autom. Sci. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To address the underlying task scheduling problem, conventional exact and heuristic algorithms encounter challenges such as rapidly increasing computation time and heavy reliance on domain knowledge, particularly when dealing with large-scale problems. The deep reinforcement learning (DRL) based methods that learn useful patterns from massive data demonstrate notable advantages. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention-based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the total value of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1500 tasks and to different flight distances of UAVs. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency. Our code is publicly available at https://faculty.csu.edu.cn/guohuawu/zh_CN/zdylm/193832/list/index.htm. Note to Practitioners—Unmanned aerial vehicles (UAVs) are of great practical usage, as they have many real world applications. When a group of UAVs are employed to execute large-scale tasks, a core question is how to scheduling the UAVs, so that they could complete the tasks efficiently. However, it is a computationally hard problem due to the exponentially increasing search space. To solve this problem, we propose a double-level deep reinforcement learning (DL-DRL) approach within a divide-and-conquer framework (DCF), where the upper-level DRL model is responsible for the task allocation, and the lower-level DRL model is responsible for the UAV route planning. To better train the two DRL models who have interplay with each other, we propose a simple yet efficient training strategy, termed interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. The experimental results based on instances of various scales show that our DL-DRL approach outperformed learning-based and conventional baselines, and the designed ITS could strike a good balance between performance and training efficiency. In light of those verified advantages, we believe that our DL-DRL approach has favorable potential to solve the practical task scheduling problem of multi-UAV in real world.
Loading