NeurWIN: Neural Whittle Index Network for Restless Bandits via Deep RL

Khaled Nakhleh; Santosh Ganji; Ping-Chun Hsieh; I-Hong Hou; Srinivas Shakkottai

NeurWIN: Neural Whittle Index Network for Restless Bandits via Deep RL

Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: deep reinforcement learning, restless bandits, Whittle index

Abstract: Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is either better than, or as good as, state-of-the-art policies for all three problems.

One-sentence Summary: New deep RL algorithm for learning the Whittle index of a restless arm independently of other arms.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/neurwin-neural-whittle-index-network-for/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=VRRrCNfyFB

17 Replies

Loading