Efficient Computation of Whittle Index for Partially Observable Restless Bandits

Qizhen Jia, Keqin Liu

Published: 08 Mar 2026, Last Modified: 30 Apr 2026OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: Restless multi-armed bandit (RMAB) models a sequential allocation problem when arms evolve even while passive (not selected) and observations may be partial. Finite-state RMABs with perfect observation are computationally intractable (PSPACE-hard), which motivates scalable index-based decision rules. Gittins indices solve the classical bandit, whereas Whittle’s relaxation extends indexation to the restless setting and has enabled broad applications in communications, queueing, and public health. However, in partially observable models the cost of computing Whittle indices can dominate runtime due to (i) belief-space blowup and (ii) repeated linear solvers within partial conservation law (PCL)/adaptive-greedy (AG) workflows. In this paper, we present an implementation-oriented optimization pipeline that leaves interfaces and index logic unchanged yet substantially accelerates end-to-end evaluation: (i) hash-guided belief deduplication with ε-radius merging, (ii) shared linear factorization with multi-RHS reuse, and (iii) batch vectorization with memoized intermediates. We observe large speedups while keeping Whittle–myopic runtime gaps within a small, controlled band. Our study complements modeling and theory by focusing on numerical and systems aspects that make index computation efficient for larger instances and tighter time budgets. We position our work alongside classic results on complexity and index policies [1]–[4] and recent advances in RMAB algorithms and applications [5]–[12].