Achieving Exponential Asymptotic Optimality in Average-Reward Restless Bandits without Global Attractor Assumption

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: restless bandits, long-run average reward, exponential asymptotic optimality
TL;DR: We propose the first policy that has exponential asymptotic optimality in the average-reward restless bandits under a set of assumptions that are weaker and more fundamental than those in the prior work.
Abstract: We study the infinite-horizon average-reward restless bandit (RB) problem, a representative class of problems within the broader framework of weakly-coupled Markov decision processes (MDPs). Each RB problem consists of $N$ MDPs coupled by a resource constraint. Existing computationally efficient policies either only achieve an $O(1/\sqrt{N})$ optimality gap or require a strong *global attractor assumption* to achieve an exponentially small $O(\exp(-C N))$ optimality gap. In this paper, we propose a novel *two-set policy* that achieves an $O(\exp(-C N))$ optimality gap under the weaker and easily verifiable assumptions of aperiodic unichain, non-degeneracy, and local stability. We further show that dropping *any* of these three assumptions precludes an exponential optimality gap, with local stability playing a particularly fundamental role as demonstrated by our lower bound. Finally, our experimental results confirm that the two-set policy outperforms existing policies when our assumptions are met but not the global attractor assumption, while remaining competitive across general settings.
Submission Number: 58
Loading