On Preference-based Stochastic Linear Contextual Bandits with Knapsacks
Abstract: This paper studies the problem of preference-based stochastic linear contextual bandits with knapsack constraints (PbLinCBwK).
We propose budget-aware optimistic and randomized exploration algorithms that achieve a regret of ${O}((\kappa+\frac{T\nu^*}{B})\sqrt{T}\log T),$ for any total budget $B=\Omega(\sqrt{T}).$
The parameters $\kappa$ and $\frac{T\nu^*}{B}$ capture the effects of preference feedback and knapsack constraints, respectively. Our regret performance is near-optimal and matches the bound of LinCBwK under the mild condition $B=\Omega(\sqrt{T}).$
To achieve these results, we view the process of budget consumption and stopping time as Markov processing and analyze it via the Lyapunov drift method, which is translated into the strong regret guarantee.
The experiments on synthetic PbLinCBwK and online content moderation setting further justify the theoretical results.
Submission Number: 993
Loading