Keywords: Reinforcement Learning, Queueing Theory, Admission Control, Two-sided Markets
Abstract: Two-sided queues are a useful formalism for modeling two-sided markets, as well as more general systems in which work is conserved. Furthermore, in practical applications the arrival rate of different entities is often unknown, and may vary based on the state. General-purpose reinforcement learning algorithms may struggle at scale due to the dependency on the diameter of the Markov Decision Process (MDP), which often scales exponentially over the state space in queueing systems. To solve these issues, we present an algorithm with a diameter-independent regret bound, for the problem of admission control in a two-sided queue. Where $S$ is the size of the state space, $N$ is the number of types, $T$ is the number of steps and $\kappa$ is the ratio between the upper and lower rate bounds, our algorithm can be shown to have a regret bound of $\tilde{O}(\kappa^{3} S^{1.5} \sqrt{T}+\kappa^{2.5} S^{1.5} \sqrt{NT})$. We then show that this can significantly outperform general-purpose algorithms in an empirical study.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 11245
Loading