ICQ: A Quantization Scheme for Best-Arm Identification Over Bit-Constrained Channels

Published: 01 Jan 2023, Last Modified: 28 Apr 2025WiOpt 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We study the problem of best-arm identification in a distributed variant of the multi-armed bandit setting, with a central learner and multiple agents. Each agent is associated with an arm of the bandit, generating stochastic rewards following a distribution that is a priori unknown to the learner. Further, each agent can communicate the observed rewards with the learner over a bit-constrained channel. We propose a novel quantization scheme called ICQ that can be applied to existing confidence-bound based learning algorithms such as Successive Elimination and requires only an exponentially sparse frequency of communication between the learner and the agents. We analyze the performance of ICQ applied to Successive Elimination, and show that the overall algorithm, which we call ICQ-SE, has order-optimal sample complexity and uses considerably fewer bits than existing quantization schemes to successfully identify the best arm. We are also able to verify our findings via numerical experiments.
Loading