Dynamic Control of Queuing Networks via Differentiable Discrete-Event Simulation
Keywords: queuing, RL, straight-through, discrete gradient
Abstract: Queuing network control is a problem that arises in many applications such as manufacturing, communications networks, call centers, hospital systems, etc. Reinforcement Learning (RL) offers a broad set of tools for training controllers for general queuing networks, but standard model-free approaches suffer from high variance of trajectories, large state and action spaces, and instability. In this work, we develop a modeling framework for queuing networks based on discrete-event simulation. This model allows us to leverage tools from the gradient estimation literature to compute approximate first-order gradients of sample-path performance metrics through auto-differentiation, despite discrete dynamics of the system. Using this framework, we derive gradient-based RL algorithms for policy optimization and planning. We observe that these methods improve sample efficiency, stabilize the system even when starting from a random initialization, and are capable of handling non-stationary, large-scale instances.
Submission Number: 27