Online Learning for Dynamic Service Mode Control

Wenqian Xing, Yue Hu, Anand Kalvit, Vahid Sarhangian

Published: 13 Jul 2025, Last Modified: 12 Jul 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Modern service systems are increasingly adopting innovative modalities enabled by emerging technologies, such as AI-assisted services, to achieve a better balance between quality and efficiency. Motivated by these advancements, we study the optimal dynamic control of service modes in systems with unknown parameters, aiming to balance reward accumulation and system congestion. We consider a single-server queueing system with two switchable service modes, each characterized by a distinct service rate and an unknown distribution of rewards earned upon service completion. The objective is to maximize the long-run average of cumulative rewards minus holding costs. To address this problem, we first characterize the optimal state-dependent policy under full knowledge of model parameters. For the case of unknown reward distributions, we propose an online learning algorithm based on Upper Confidence Bound estimates to adaptively learn the optimal policy. Our algorithm achieves a statistically near-optimal regret bound of O(√T) over a time horizon T and demonstrates strong numerical performance. A key methodological contribution is a novel regret decomposi- tion and a regenerative cycle-based framework, offering broader insights into learning-based optimal control in queueing systems. Finally, we demonstrate the practical relevance of our approach through a case study on optimizing AI-assisted patient message replies in a healthcare setting.