QWI: Q-learning with Whittle Index

Francisco Robledo, Vivek S. Borkar, Urtzi Ayesta, Konstantin Avrachenkov

Published: 2022, Last Modified: 10 May 2023SIGMETRICS Perform. Evaluation Rev. 2022Readers: Everyone

Abstract: The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn theWhittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Qfunctions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.

0 Replies