Middle-Mile Logistics Through the Lens of Goal-Conditioned Reinforcement Learning
Confirmation: I have read and confirm that at least one author will be attending the workshop in person if the submission is accepted
Keywords: reinforcement learning, operations research, logistics
TL;DR: We phrase middle-mile logistics as a GCRL problem and experiment with PPO on our open-source environment.
Abstract: Middle-mile logistics describes the problem of routing parcels through a network of hubs, which are linked by a fixed set of trucks. The main challenge comes from the finite capacity of the trucks. The decision to allocate a parcel to a specific truck might block another parcel from using the same truck. It is thus necessary to solve for all parcel routes simultaneously. Exact solution methods scale poorly with the problem size and real-world instances are intractable. Instead, we turn to reinforcement learning (RL) by rephrasing the middle-mile problem as a multi-object goal-conditioned Markov decision process. The key ingredients of our proposed method for parcel routing are the extraction of small feature graphs from the environment state and the combination of graph neural networks with model-free RL. There remain several open challenges and we provide an open-source implementation of the environment to encourage stronger cooperation between the reinforcement learning and logistics communities.
Submission Number: 23