Clustering with bandit feedback: breaking down the computation/information gap

Published: 18 Dec 2024, Last Modified: 14 Feb 2025ALT 2025EveryoneRevisionsBibTeXCC BY 4.0
Abstract: We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an $N$-armed stochastic bandit with $d$-dimensional subGaussian feedback. There exists a hidden partition of the arms into $K$ groups, such that arms within the same group, share the same mean vector. The learner's task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant $\delta$. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.
PDF: pdf
Submission Number: 73
Loading