Safe Online Convex Optimization with First-Order Feedback

Spencer Hutchinson, Mahnoosh Alizadeh

Published: 2024, Last Modified: 30 Sept 2024ACC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study an online convex optimization problem where the player must satisfy an unknown constraint at all rounds, while only observing the gradient and function value of the constraint at the chosen actions. For this problem, we develop an algorithm that uses an optimistic set, which overestimates the constraint, to identify low-regret actions while using a pessimistic set, which underestimates the constraint, to ensure constraint satisfaction. Our analysis shows that this algorithm satisfies the constraint at all rounds while enjoying $\mathcal{O}(\sqrt{T})$ regret when the constraint function is smooth and strongly convex. We then extend our algorithm to a setting with time-varying constraints and prove that it enjoys similar guarantees in this setting. Lastly, we demonstrate the effectiveness of our algorithm with a set of numerical experiments.