On the sensitivity of restless bandits solutions to uncertainty in the models of the arms

Amit Sinha, Aditya Mahajan

Published: 2025, Last Modified: 30 Jan 2026Ann. Oper. Res. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Restless multi-armed bandits (RMAB) are a popular framework for modeling resource allocation and scheduling problems arising in various applications. Such applications can be modeled as Markov decision processes (MDP), but optimal or sub-optimal solution through dynamic programming suffer from high complexity. RMAB provides a heuristic solution, where the solution complexity scales linearly with the number of alternatives. However, these heuristic solutions are derived under the assumption that the model of all arms are known perfectly. In this paper, we consider RMAB with uncertainty in the rewards and dynamics of the arms. In such a setting, using a robust MDP solution is not possible due to high computational complexity. So, we consider a certainty equivalence approach and bound the additional loss in performance due to model inaccuracy. Our bounds are directly in terms of the model uncertainty of each arm and we illustrate their use via examples.

External IDs:dblp:journals/anor/SinhaM25