Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

S. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe

Published: 2024, Last Modified: 16 May 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.