Keywords: Deep RL in supply chain, PPO, optimization heuristics, imitation learning
Abstract: The present article studies RL methods for multi-echelon inventory optimization, one of the most natural real-world applications of RL. A lot of attempts appeared during the past years in the operations research community; we approach the problem from the RL point of view. To this end, we design an abstraction that covers features of real-life supply chains typical in the process industry. Our abstraction can be implemented as a gymnasium environment to be trained with standard algorithms. We propose to combine MRP optimization heuristics from operations research in combination with imitation learning to pre-train the RL algorithms. We compare experimentally PPO with and without pre-training to the MRP heuristic. In particular, we give a zero-shot comparison to show that deep RL agents generalize better to disruptions in the supply chain.
Supplementary Material: pdf
Submission Number: 155
Loading