Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Weakly-Coupled MDPs; Sample Complexity; Average Reward; Reinforcement Learning
TL;DR: We study learning in average-reward weakly coupled Markov decision processes (WCMDPs) with heterogeneous arms.
Abstract: We study learning in average-reward weakly coupled Markov decision processes (WCMDPs) with heterogeneous arms. Naive approaches suffer exponential computation and sample complexity in the number of subsystems. We study a plug-in approach built on an efficient planning algorithm, which attains the first finite-sample (PAC) optimality-gap guarantees with polynomial sample complexity. This result is established under a new framework built on a Lyapunov analysis of a reference policy combined with a Lyapunov drift transfer technique, which can be viewed as a generalization of the classical simulation lemma.
Submission Number: 189
Loading