Learning an Inventory Control Policy with General Inventory Arrival Dynamics

Sohrab Andaz; Randy Jia; Carson Eisenach; Dhruv Madeka; Kari Torkkola; Dean Foster; Sham M. Kakade

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

Sohrab Andaz, Randy Jia, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham M. Kakade

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning, inventory control

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We apply deep reinforcement learning (RL) to solve the periodic review inventory control problem with general arrival dynamics learned from historical data and derive a learnability results for our problem formulation.

Abstract: We apply deep reinforcement learning (RL) to solve the periodic review inventory control problem with general arrival dynamics. In this work, we incorporate a learned model of transition dynamics (inventory arrivals) into the inventory control problem formulation, increasing the fidelity of the resulting simulator. Leveraging recent results (Madeka et al., 2022), we demonstrate a reduction of the complexity of the inventory control problem we consider to that of supervised learning, proving that under mild assumptions our backtest of inventory control policies is accurate. We also propose several metrics by which to evaluate the inventory arrivals model, and demonstrate the impact of an improved arrivals model on policy performance via a comparison of policies learned on our simulator with one learned on a simulator with less accurate arrivals dynamics. Finally, we use data from a real world A/B test of an RL agent trained using our simulator with learned dynamics to evaluate the performance of the arrivals model, showing that empirically it generalizes well to the off-policy state distribution induced by the RL agent in an actual supply chain.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4246

Loading