Cascading Contextual Assortment Bandits

Published: 21 Sept 2023, Last Modified: 16 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: cascade bandit, assortment bandit, upper confidence bound, exploration and exploitation, combinatorial optimization
TL;DR: We propose UCB-type algorithm for cascading contextual assortment bandits with the sharpest regret upper bound.
Abstract: We present a new combinatorial bandit model, the \textit{cascading contextual assortment bandit}. This model serves as a generalization of both existing cascading bandits and assortment bandits, broadening their applicability in practice. For this model, we propose our first UCB bandit algorithm, UCB-CCA. We prove that this algorithm achieves a $T$-step regret upper-bound of $\tilde{\mathcal{O}}(\frac{1}{\kappa}d\sqrt{T})$, sharper than existing bounds for cascading contextual bandits by eliminating dependence on cascade length $K$. To improve the dependence on problem-dependent constant $\kappa$, we introduce our second algorithm, UCB-CCA+, which leverages a new Bernstein-type concentration result. This algorithm achieves $\tilde{\mathcal{O}}(d\sqrt{T})$ without dependence on $\kappa$ in the leading term. We substantiate our theoretical claims with numerical experiments, demonstrating the practical efficacy of our proposed methods.
Supplementary Material: pdf
Submission Number: 10893
Loading