Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

Tianhao Wu; Matthew Zurek; Yudong Chen; Weina Wang; Qiaomin Xie

Lyapunov-Based Sample Complexity Analysis for Weakly-Coupled MDPs

Tianhao Wu, Matthew Zurek, Yudong Chen, Weina Wang, Qiaomin Xie

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0

Keywords: Weakly-Coupled MDPs; Sample Complexity; Average Reward; Reinforcement Learning

TL;DR: We study learning in average-reward weakly coupled Markov decision processes (WCMDPs) with heterogeneous arms.

Abstract: We study learning in average-reward weakly coupled Markov decision processes (WCMDPs) with heterogeneous arms. Naive approaches suffer exponential computation and sample complexity in the number of subsystems. We study a plug-in approach built on an efficient planning algorithm, which attains the first finite-sample (PAC) optimality-gap guarantees with polynomial sample complexity. This result is established under a new framework built on a Lyapunov analysis of a reference policy combined with a Lyapunov drift transfer technique, which can be viewed as a generalization of the classical simulation lemma.

Submission Number: 189

Loading