Deep Reinforcement Learning with Two-Stage Training Strategy for Practical Electric Vehicle Routing Problem with Time Windows

Abstract: Recently, it is promising to apply deep reinforcement learning (DRL) to the vehicle routing problem (VRP), which is widely employed in modern logistics systems. A practical extension of VRP is the electric vehicle routing problem with time windows (EVRPTW). In this problem, the realistic traveling distance and time are non-Euclidean and asymmetric, and the constraints are more complex. These characteristics result in a challenge when using the DRL approach to solve it. This paper proposes a novel end-to-end DRL method with a two-stage training strategy. First, a graph attention network with edge features is designed to tackle the graph with the asymmetric traveling distance and time matrix. The node and edge features of the graph are effectively correlated and captured. Then, a two-stage training strategy is proposed to handle the complicated constraints. Some constraints are allowed to be violated to enhance exploration in the first stage, while all the constraints are enforced to be satisfied to guarantee a feasible solution in the second stage. Experimental results show that our method outperforms the state-of-the-art methods and can be generalized well to different problem sizes.
0 Replies
Loading