Triple-Optimistic Learning for Stochastic Contextual Bandits with General Constraints

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study contextual bandits with general constraints, where a learner observes contexts and aims to maximize cumulative rewards while satisfying a wide range of general constraints. We introduce the Optimistic$^3$ framework, a novel learning and decision-making approach that integrates optimistic design into parameter learning, primal decision, and dual violation adaptation (i.e., triple-optimism), combined with an efficient primal-dual architecture. Optimistic$^3$ achieves $\tilde{O}(\sqrt{T})$ regret and constraint violation for contextual bandits with general constraints. This framework not only outperforms the state-of-the-art results that achieve $\tilde{O}(T^{\frac{3}{4}})$ guarantees when Slater's condition does not hold but also improves on previous results that achieve $\tilde{O}(\sqrt{T}/\delta)$ when Slater's condition holds ($\delta$ denotes the Slater's condition parameter), offering a $O(1/\delta)$ improvement. Note this improvement is significant because $\delta$ can be arbitrarily small when constraints are particularly challenging. Moreover, we show that Optimistic$^3$ can be extended to classical multi-armed bandits with both stochastic and adversarial constraints, recovering the best-of-both-worlds guarantee established in the state-of-the-art works, but with significantly less computational overhead.
Lay Summary: Online decision-making systems, such as recommendation engines and online platforms, often need to make choices that maximize rewards while satisfying important real-world constraints. However, efficiently designing learning algorithms that respect such general constraints remains a major challenge, particularly when the feasibility conditions are difficult to verify or entirely unknown. To address this, we developed Optimistic$^3$, a novel algorithmic framework for contextual bandits with general constraints. Our approach achieves optimal and improved theoretical guarantees even without relying on strong feasibility assumptions, substantially outperforming previous methods that either required such assumptions or exhibited degraded performance under tight constraints.
Link To Code: YjQ2Y
Primary Area: Optimization
Keywords: tripe-optimitistic framework, contetxual bandits, general constraints
Submission Number: 14845
Loading