Keywords: Pure exploration, Unknown linear constraints, Bandits, Lagrangian optimization
TL;DR: We study the impact of unknown linear constraints on pure exploration in multi-armed bandits. We provide lagrangian relaxation of the lower bound and leverage it to derive two novel algorithms with efficient numerical performance.
Abstract: Pure exploration in bandits can model eclectic real-world decision making problems, such as tuning hyper-parameters or conducting user studies, where sample frugality is desired. Thus, considering different safety, resource, and fairness constraints on the decision space has gained increasing attention. In this paper, we study generalisation of these problems as pure exploration in multi-armed bandits with unknown linear constraints. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration. We further derive how this lower bound converges to the existing lower bound for pure exploration under known constraints, and how the hardness of the problem changes with the geometry induced by the constraint estimation procedure. We further leverage the Lagrangian lower bound and properties of convex optimisation to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorations, namely LATS and LAGEX. Designing these algorithms require us to propose a new constraint-adaptive stopping rule, and also at each step, using pessimistic estimates of constraints in the Lagrangian lower bound. We show that these algorithms asymptotically achieve the desired sample complexity bounds. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LAGEX and LATS with respect to baselines.
Submission Number: 109
Loading