Abstract: In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the Capacitated Vehicle Routing Problem (CVRP). Additionally, we are the first to tackle the Multiple Depot Vehicle Routing Problem (MDVRP) with Reinforcement Learning (RL) and demonstrate that this problem type greatly benefits from our approach over other Machine Learning (ML) methods.
0 Replies
Loading