Keywords: deep reinforcement learning, inventory management, policy regularization, basestock
Abstract: In the age of big data and large-scale compute, Deep Reinforcement Learning (DRL) provides a general-purpose methodology for optimizing inventory policies. However, off-the-shelf implementations of DRL have seen mixed success, often plagued by sensitivity to the hyperparameters used during training. In this paper, we show that by imposing policy regularizations, grounded in classical inventory concepts such as "Base Stock", we can greatly accelerate hyperparameter tuning and improve the final performance of several DRL methods. We report details from a full-scale deployment of DRL with policy regularizations on Alibaba's e-commerce platform, Tmall. Our paper also includes extensive synthetic experiments, which show that policy regularizations redefine the narrative on what is the best DRL method for inventory management.
Submission Number: 148
Loading