Keywords: Inventory Management, Multi-Period Inventory Problem, Deep Learning, Replenishment Policy
TL;DR: This paper proposes a deep neural network model for the multi-period inventory problem that learns and predicts the optimal parameters of an (s,S) policy.
Abstract: We propose a supervised learning algorithm for the multi-period inventory problem (MPIP) that tackles shortcomings of existing multi-step, model-based methods on the one and policy-free reinforcement learning algorithms on the other hand. As a model-free end-to-end (E2E) method that takes advantage of auxiliary data, it avoids pitfalls like model misspecification, multi-step error accumulation and computational complexity induced by a repeated optimization step. Furthermore, it manages to leverage domain knowledge about the optimal solution structure. To the best of our knowledge, this is one of the first supervised learning approaches to solve the MPIP and the first one to learn policy parameters. Given the variety of settings in which OR researchers have developed well-performing policies, our approach can serve as a blueprint of how to design E2E methods that leverage that knowledge. We validate our hypotheses on synthetic data and demonstrate the effect of individual model characteristics.