Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits

TMLR Paper5652 Authors

16 Aug 2025 (modified: 27 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study a stochastic multi-armed bandit setting where arms are partitioned into known clusters, such that the mean rewards of arms within a cluster differ by at most a known threshold. While the clustering structure is known a priori, the arm means are unknown. We derive an asymptotic lower bound on the regret that improves upon the classical bound of Lai & Robbins (1985). We then propose Clus-UCB, an efficient algorithm that closely matches this lower bound asymptotically. Clus-UCB is designed to exploit the clustering structure and introduces a new index to evaluate an arm, which depends on other arms within the cluster. In this way, arms share information among each other. We present simulation results of our algorithm and compare its performance against KL-UCB and other well known algorithms for bandits with dependent arms. Finally, we address some limitations of this work and conclude by mentioning some possible future research.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Junpei_Komiyama1
Submission Number: 5652
Loading