Keywords: Multi-Task,Best Arm Identification,Risk Constraint, Fixed Confidence
TL;DR: This paper tackles multi-task best arm identification with risk constraints, deriving instance-dependent sample complexity bounds and proposing an optimal algorithm, with extensions to linear bandits.
Abstract: Best Arm Identification is a very challenging problem in sequential decision-making with many real-world applications. Existing works typically assume that all arms are feasible or/and deal with expectation-based constraints with strong assumptions, loose sample complexity bounds, and non-optimal algorithms. This paper introduces a multi-task best arm identification problem with risk constraint in the fixed-confidence setting, where each arm has multiple performance metrics. The agent aims to optimize one metric while ensuring that the quantiles of other metrics remain below specified thresholds for each task. We first derive a tight, instance-dependent lower bound on sample complexity. Based on this bound, we establish optimality conditions for the static optimal sampling ratio and illustrate how it balances among different tasks and constraints, while addressing the trade-off between optimality and feasibility. We derive a Track-and-Stop strategy with asymptotically optimal sample complexity and a computationally efficient strategy that iteratively solves the optimality conditions. Finally, we extend our results to the linear bandit setting. Numerical experiments show that our algorithm performs relatively well.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8455
Loading