Deep RL for Multi-Echelon Supply Chains

RLC 2025 Conference Submission155 Authors

20 Feb 2025 (modified: 09 May 2025)Submitted to RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep RL in supply chain, PPO, optimization heuristics, imitation learning
Abstract: The present article studies RL methods for multi-echelon inventory optimization, one of the most natural real-world applications of RL. A lot of attempts appeared during the past years in the operations research community; we approach the problem from the RL point of view. To this end, we design an abstraction that covers features of real-life supply chains typical in the process industry. Our abstraction can be implemented as a gymnasium environment to be trained with standard algorithms. We propose to combine MRP optimization heuristics from operations research in combination with imitation learning to pre-train the RL algorithms. We compare experimentally PPO with and without pre-training to the MRP heuristic. In particular, we give a zero-shot comparison to show that deep RL agents generalize better to disruptions in the supply chain.
Supplementary Material: pdf
Submission Number: 155
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview