A Reinforcement Learning Approach for Joint Replenishment Policy in Multi-Product Inventory SystemDownload PDF

24 Apr 2019 (modified: 13 Jul 2022)RL4RealLife 2019Readers: Everyone
Keywords: joint replenishment policy, multi-product inventory, stochastic inventory control, reinforcement learning, reward allocation
Abstract: This study proposes a reinforcement learning approach to find the near-optimal dynamic ordering policy for a multi-product inventory system with non-stationary demands. The distinguishing feature of multi-product inventory systems is the need to take into account the coordination among products with the aim of total cost reduction. The Markov decision process formulation has been used to obtain an optimal policy. However, the curse of dimensionality has made it intractable for a large number of products. For more products, heuristic algorithms have been proposed on the assumption of a stationary demand in literature. In this study, we propose an extended Q-learning agent with function approximation, called the branching deep Q-network (DQN) with reward allocation based on the branching double DQN. Our numerical experiments show that the proposed agent learns the coordinated order policy without any knowledge of other products' decisions and outperforms non-coordinated forecast-based economic order policy.
0 Replies

Loading