Contextual Bandits with Knapsacks beyond Worst Cases via Re-Solving

Rui Ai; Zhaohua Chen; Xiaotie Deng; Yuqi Pan; Chang Wang; Mingwei Yang

Contextual Bandits with Knapsacks beyond Worst Cases via Re-Solving

Rui Ai, Zhaohua Chen, Xiaotie Deng, Yuqi Pan, Chang Wang, Mingwei Yang

10 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX

Keywords: Contextual bandits with knapsacks, Re-solving, Regularity

Abstract: Contextual Bandits with Knapsacks (CBwK) is a fundamental and essential framework for modeling a dynamic decision-making scenario with resource constraints. Under this framework, an agent selects an action in each round upon observing a request, leading to a reward and resource consumption that are further associated with an unknown external factor. The agent's target is to maximize the total reward under the initial inventory. While previous research has already established an $\widetilde{O}(\sqrt{T})$ worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective, one for worst-case locations, and another for logarithmic regret rates. We start by demonstrating that the unique-optimality and degeneracy of the fluid LP problem, which is both succinct and easily verifiable, is a sufficient condition for the existence of an $\Omega(\sqrt{T})$ regret lower bound. To supplement this worst-case location result, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an $\widetilde{O}(1)$ regret as long as the fluid LP has a unique and non-degenerate solution. This condition is mild as it is satisfied for most problem instances. Furthermore, we prove our algorithm maintains a near-optimal $\widetilde{O}(\sqrt{T})$ regret even in the worst cases, and extend these results to the setting where request and external factor are continuous. Regarding information, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.

Supplementary Material: pdf

Submission Number: 6087

Loading