Keywords: Reinforcement Learning, Supply Chain Management, Inventory Management, Integer Programming
TL;DR: Combining integer programming and sample approximation with DRL beats both state-of-art vanilla-DRL baselines and commonly used heuristics for supply-chain inventory optimization.
Abstract: Reinforcement Learning has lead to considerable break-throughs in diverse areas
such as robotics, games and many others. But the application of RL to complex real world decision making problems remains limited. Many problems in Operations Management (inventory and revenue management, for example) are characterized
by large action spaces and stochastic system dynamics. These characteristics
make the problem considerably harder to solve for existing RL methods that
rely on enumeration techniques to solve per step action problems. To resolve
these issues, we develop Programmable Actor Reinforcement Learning (PARL), a
policy iteration method that uses techniques from integer programming and sample
average approximation. Analytically, we show that the for a given critic, the learned
policy in each iteration converges to the optimal policy as the underlying samples
of the uncertainty go to infinity. Practically, we show that a properly selected
discretization of the underlying uncertain distribution can yield near optimal actor
policy even with very few samples from the underlying uncertainty. We then apply
our algorithm to real-world inventory management problems with complex supply
chain structures and show that PARL outperforms state-of-the-art RL and inventory
optimization methods in these settings. We find that PARL outperforms commonly
used base stock heuristic by 51.3% and RL based methods by up to 9.58% on
average across different supply chain environments.
Supplementary Material: zip
0 Replies
Loading