Achieving $\widetilde{\mathcal O}(1/N)$ Optimality Gap in Weakly-Coupled Markov Decision Processes through Gaussian Approximation

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Markov Decision Processes, Gaussian Approximation, Stochastic Optimization
TL;DR: We propose an SP-based policy for finite-horizon WCMDPs that achieves an $\widetilde{\mathcal O}(1/N)$ gap, overcoming the $\mathcal{O}(1/\sqrt{N})$ limits of LP-based methods in degenerate regimes.
Abstract: We study finite-horizon weakly-coupled Markov decision processes (WCMDPs) with $N$ homogeneous agents, where each agent is modeled as an MDP. Prior work has shown that linear-programming-based (LP-based) policies, derived from the fluid approximation that captures the system’s mean dynamics, achieve an $\mathcal{O}(1/\sqrt{N})$ optimality gap per agent. In this paper, we present instances where this gap is in fact $\Theta(1/\sqrt{N})$. We further propose a novel stochastic-programming-based (SP-based) policy that, under a mild uniqueness assumption, achieves an $\widetilde{\mathcal O}(1/N)$ optimality gap per agent. Our approach constructs a Gaussian stochastic system centered around the fluid-optimal trajectory, capturing both the mean and the variance of the WCMDP dynamics. This results in a more accurate approximation than the fluid approximation. The policy is then obtained by solving a linear Gaussian stochastic program for this system. To the best of our knowledge, this is the first result to establish an $\widetilde{\mathcal O}(1/N)$ optimality gap for WCMDPs under a uniqueness assumption.
Submission Number: 122
Loading