Keywords: weakly-coupled Markov decision processes, fully heterogeneous systems, asymptotic optimality, planning, average-reward Markov decision processes, Lyapunov analysis
TL;DR: We use a novel projection-based Lyapunov function to prove the first asymptotic optimality theorem for fully heterogeneous weakly-coupled Markov Decision Processes.
Abstract: Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied.
In this paper, we study the _fully heterogeneous_ setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs).
Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large.
We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large.
This is the _first asymptotic optimality result_ for fully heterogeneous average-reward WCMDPs.
Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.
Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)
Submission Number: 9466
Loading