- Keywords: rienforcement learning, neural combinatorial optimization, vehicle routing problem with time windows, attention model
- Abstract: In contrast to the classical techniques for solving combinatorial optimization problems, recent advancements in reinforcement learning yield the potential to independently learn heuristics without any human interventions. In this context, the current paper aims to present a complete framework for solving the vehicle routing problem with time windows (VRPTW) relying on neural networks and reinforcement learning. Our approach is mainly based on an attention model (AM) that predicts the near-optimal distribution over different problem instances. To optimize its parameters, this model is trained in a reinforcement learning(RL) environment using a stochastic policy gradient and through a real-time evaluation of the reward, quantity to meet the problem business and logical constraints. Using synthetic data, the proposed model outperforms some existing baselines. This performance comparison was on the basis of the solution quality (total tour length) and the computation time (inference time) for small and medium-sized samples.