Abstract: Restless multi-armed bandits (RMAB) are a popular framework for modeling resource allocation and scheduling problems arising in various applications. Such applications can be modeled as Markov decision processes (MDP), but optimal or sub-optimal solution through dynamic programming suffer from high complexity. RMAB provides a heuristic solution, where the solution complexity scales linearly with the number of alternatives. However, these heuristic solutions are derived under the assumption that the model of all arms are known perfectly. In this paper, we consider RMAB with uncertainty in the rewards and dynamics of the arms. In such a setting, using a robust MDP solution is not possible due to high computational complexity. So, we consider a certainty equivalence approach and bound the additional loss in performance due to model inaccuracy. Our bounds are directly in terms of the model uncertainty of each arm and we illustrate their use via examples.
External IDs:dblp:journals/anor/SinhaM25
Loading