A Unifying Framework for Online Safe OptimizationDownload PDF

Published: 05 Dec 2022, Last Modified: 05 May 2023MLSW2022Readers: Everyone
Abstract: We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ \emph{long-term constraints}. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first \emph{best-of-both-world} type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear regret, where $\rho$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the \emph{full-feedback} as well as the \emph{bandit-feedback} setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not \emph{packing} (\emph{e.g.}, ROI constraints).
1 Reply