Abstract: We study the problem of best-arm identification in a distributed variant of the multi-armed bandit
setting, with a central learner and multiple agents. Each agent is associated with an arm of the bandit,
generating stochastic rewards following an unknown distribution. Further, each agent can communicate
the observed rewards with the learner over a bit-constrained channel. We propose a novel quantization
scheme called Inflating Confidence for Quantization (ICQ) that can be applied to existing confidence-
bound based learning algorithms such as Successive Elimination. We analyze the performance of ICQ
applied to Successive Elimination and show that the overall algorithm, named ICQ-SE, has the order-
optimal sample complexity as that of the (unquantized) SE algorithm. Moreover, it requires only an
exponentially sparse frequency of communication between the learner and the agents, thus requiring
considerably fewer bits than existing quantization schemes to successfully identify the best arm. We val-
idate the performance improvement offered by ICQ with other quantization methods through numerical
experiments.
Loading