Keywords: Online Learning, Queueing, Service mode control, Unknown parameters
TL;DR: We develop online learning algorithms for dynamic service mode control in queues, achieving near-optimal regret through regenerative analysis and validating the approach in AI-assisted healthcare operations.
Abstract: Modern service systems are increasingly adopting new modalities enabled by emerging technologies, such as AI-assisted services, to better balance quality and efficiency. Motivated by this trend, we study dynamic service mode control in a single-server queue with two switchable modes, each with a distinct service rate and an unknown reward distribution. The objective is to maximize the long-run average of expected cumulative rewards minus holding costs achievable under non-anticipating, state-dependent policies. To address the problem, we first establish the optimality of a threshold policy under full information of the problem primitives. When reward distributions are unknown but samples are observable, we propose an online learning algorithm that uses Upper Confidence Bound (UCB) estimates of the unknown parameters to adaptively learn the optimal threshold. Our algorithm achieves statistically near-optimal regret of $\tilde{{O}}(\sqrt{T})$ and demonstrates strong numerical performance. Additionally, when additional partial information about the optimal policy is available ex ante (specifically, a non-trivial lower bound on the optimal threshold), we show that an episodic greedy policy achieves constant regret by leveraging a free-exploration property intrinsic to this special setting. Methodologically, we develop a novel regret decomposition and regenerative cycle-based analysis, offering general tools for learning-based queueing control. Lastly, we conduct a healthcare case study on AI-assisted patient messaging demonstrating the practical utility of our approach.
Submission Number: 178
Loading