Constrained Pareto Set Identification with Bandit Feedback

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study the identification of the Pareto Set in multi-objective bandits under linear feasibility constraints.
Abstract: In this paper, we address the problem of identifying the Pareto Set under feasibility constraints in a multivariate bandit setting. Specifically, given a $K$-armed bandit with unknown means $\mu_1, \dots, \mu_K \in \mathbb{R}^d$, the goal is to identify the set of arms whose mean is not uniformly worse than that of another arm (i.e., not smaller for all objectives), while satisfying some known set of linear constraints, expressing, for example, some minimal performance on each objective. Our focus lies in fixed-confidence identification, for which we introduce an algorithm that significantly outperforms racing-like algorithms and the intuitive two-stage approach that first identifies feasible arms and then their Pareto Set. We further prove an information-theoretic lower bound on the sample complexity of any algorithm for constrained Pareto Set identification, showing that the sample complexity of our approach is near-optimal. Our theoretical results are supported by an extensive empirical evaluation on a series of benchmarks.
Lay Summary: We develop a method to identify the best trade-offs among multiple competing objectives (the Pareto Set) while ensuring certain performance constraints are met. Our algorithm is both theoretically efficient and practically effective, outperforming existing approaches in simulations. This has important applications in areas like clinical trials or recommendation systems, where it's crucial to balance multiple goals while respecting safety or fairness constraints.
Primary Area: Theory->Online Learning and Bandits
Keywords: pure exploration, bandit, theoretical analysis, active learning, Pareto set identification, multi-objective bandits
Submission Number: 13304
Loading