Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

Souvik Barat, Prashant Kumar, Monika Gajrani, Harshad Khadilkar, Hardik Meisheri, Vinita Baniwal, Vinay Kulkarni

2019 (modified: 10 Nov 2021)MABS 2019Readers: Everyone

Abstract: Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

0 Replies