ScheduleNet: Learn to Solve MinMax mTSP Using Reinforcement Learning with Delayed RewardDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Abstract: Combinatorial Optimization (CO) problems are theoretically challenging yet crucial in practice. Numerous works used Reinforcement Learning (RL) to tackle these CO problems. As current approaches mainly focus on single-worker CO problems such as the famous Travelling Salesman Problem (TSP), we focus on more practical extension of TSP to multi-worker (salesmen) setting, specifically MinMax mTSP. From the RL perspective, Minmax mTSP raises several significant challenges, such as the cooperation of multiple workers and the need for a well-engineered reward function. In this paper, we present the RL framework with (1) worker-task heterograph and type-aware Graph Neural Network, and (2) the RL training method that is stable, has fast convergence speed, and directly optimizes the objective of MinMax mTSP in a delayed reward setting. We achieve comparable performance to a highly optimized meta-heuristic baseline, OR-Tools, and outperforms it in 10% of the cases, both on in-training and out-of-training problem distributions. Moreover, our problem formulation enables us to solve problems with any number of salesmen (workers) and cities.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: The study introduces RL based approach for constructing the solution to mTSP, and demonstrates the potential that the learned policy can be effectively used to schedule multiple vehicles for solving large-scale, practical, real-world applications
Reviewed Version (pdf): https://openreview.net/references/pdf?id=WKfOH7cMA
12 Replies

Loading