On Preference-based Stochastic Linear Contextual Bandits with Knapsacks

Published: 22 Jan 2025, Last Modified: 10 Mar 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK). We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$ The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$ To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee. The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.
Submission Number: 993
Loading