Stable batched bandit:  Optimal regret with free inference

Ishan Sengupta; Koulik Khamaru

Stable batched bandit: Optimal regret with free inference

Ishan Sengupta, Koulik Khamaru

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Batched Bandit, Inference in Bandits

Abstract: In this paper, we discuss statistical inference when using a sequential strategy to collect data. While inferential tasks become challenging with sequentially collected data, we argue that this problem can be alleviated when the sequential algorithm satisfies certain stability properties; we call such algorithms stable bandit algorithms. Focusing on batched bandit problems, we first demonstrate that popular algorithms including the greedy-UCB algorithm and $\epsilon$-greedy ETC algorithms are not stable, complicating downstream inferential tasks. Our main result shows that a form of elimination algorithm is stable in the batched bandit setup, and we characterize the asymptotic distribution of the sample means. This result allows us to construct asymptotically exact confidence intervals for arm-means which are sharper than existing concentration-based bounds. As a byproduct of our main results, we propose an Explore and Commit (ETC) strategy, which is stable --- thus allowing easy statistical inference--- and also attains optimal regret up to a factor of 4. Our work connects two historically conflicting paradigms in sequential learning environments: regret minimization and statistical inference. Ultimately, we demonstrate that it is possible to minimize regret without sacrificing the ease of performing statistical inference, bridging the gap between these two important aspects of sequential decision-making.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8584

Loading