Understanding Exploration in Bandits with Switching Constraints: A Batched Approach in Fixed-Confidence Pure Exploration

Newton Mwai; Milad Malekipirbazari; Fredrik D. Johansson

Understanding Exploration in Bandits with Switching Constraints: A Batched Approach in Fixed-Confidence Pure Exploration

Newton Mwai, Milad Malekipirbazari, Fredrik D. Johansson

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Armed Bandits, Fixed-Confidence Pure Exploration, Switching Constraints, Batched Bandits

TL;DR: A practical and well-defined solution for pure exploration with switching constraints, with a batched reformulation of the problem.

Abstract: Most multi-armed bandit algorithms focus on efficient exploration, often oblivious to constraints tied to exploration like the cost of switching between arms. Switching costs arise in real-world applications such as personalized medicine, in which changes in treatment may require a wash-out period where the patient is not taking any drug; or in industrial applications where reconfiguring production is costly. Unfortunately, controlling for switching is significantly understudied outside of regret minimization. In this work, we present a formulation of the fixed-confidence pure exploration problem with constraints on the arm switching frequency. We show how this problem lends itself to batched bandits and give a lower bound on the exploration time for any such algorithm. We translate this idea into two algorithms inspired by the track-and-stop framework, adapted to batch plays with a limited number of arm switches per batch. Finally, we demonstrate empirically that our approach achieves quick stopping times, comparable to unconstrained algorithms, even when constrained to a minimal switching limit.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Newton_Mwai2

Track: Regular Track: unpublished work

Submission Number: 60

Loading