Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits
Abstract: We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an O(1/N) optimality gap for an N-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption. Funding: Y. Hong and W. Wang are supported in part by the U.S. National Science Foundation (NSF) [Grants ECCS-2145713, CCF-2403194, CCF-2428569, and ECCS-2432545]. Y. Chen is supported in part by the NSF [Grant CCF-2233152]. Q. Xie is supported in part by the NSF [Grants CNS-1955997, ECCS-2339794, and ECCS-2432546]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2024.0678.
External IDs:doi:10.1287/moor.2024.0678
Loading