Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access
Abstract: In this paper, we consider a class of restless multiarmed
bandit processes (RMABs) that arises in dynamic
multichannel access, user/server scheduling, and optimal activation
in multiagent systems. For this class of RMABs, we establish
the indexability and obtain Whittle index in closed form for both
discounted and average reward criteria. These results lead to a
direct implementation of Whittle index policy with remarkably
low complexity. When arms are stochastically identical, we show
that Whittle index policy is optimal under certain conditions.
Furthermore, it has a semiuniversal structure that obviates the
need to know the Markov transition probabilities. The optimality
and the semiuniversal structure result from the equivalence between
Whittle index policy and the myopic policy established in
this work. For nonidentical arms, we develop efficient algorithms
for computing a performance upper bound given by Lagrangian
relaxation. The tightness of the upper bound and the near-optimal
performance of Whittle index policy are illustrated with simulation
examples.
0 Replies
Loading