Understanding Exploration in Bandits with Switching Constraints: A Batched Approach in Fixed-Confidence Pure Exploration
Keywords: Multi-Armed Bandits, Fixed-Confidence Pure Exploration, Switching Constraints, Batched Bandits
TL;DR: A practical and well-defined solution for pure exploration with switching constraints, with a batched reformulation of the problem.
Abstract: Most multi-armed bandit algorithms focus on efficient exploration, often oblivious to constraints tied to exploration like the cost of switching between arms. Switching costs arise in real-world applications such as personalized medicine, in which changes in treatment may require a wash-out period where the patient is not taking any drug; or in industrial applications where reconfiguring production is costly. Unfortunately, controlling for switching is significantly understudied outside of regret minimization. In this work, we present a formulation of the fixed-confidence pure exploration problem with constraints on the arm switching frequency. We show how this problem lends itself to batched bandits and give a lower bound on the exploration time for any such algorithm. We translate this idea into two algorithms inspired by the track-and-stop framework, adapted to batch plays with a limited number of arm switches per batch. Finally, we demonstrate empirically that our approach achieves quick stopping times, comparable to unconstrained algorithms, even when constrained to a minimal switching limit.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Newton_Mwai2
Track: Regular Track: unpublished work
Submission Number: 60
Loading