True Impact of Cascade Length in Contextual Cascading Bandits

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: contextual bandit, cascading bandit, upper confidence bound, online leaning
TL;DR: We show that in contextual cascading bandits, regret vanishes as the cascade length grows, with nearly matching upper and lower bounds.
Abstract: We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (\emph{cascade}) of items, and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various applications including recommender systems and search engines, the role of the cascade length $K$ in shaping regret has remained unclear. Contrary to prior results that regret grows with $K$, we prove that regret actually \emph{decreases} once $K$ is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, $\tilde{\mathcal{O}}\bigl(\min \lbrace K\bar{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$ for contextual cascading bandits. To complement this new regret upper bound, we provide a nearly matching lower bound of $\Omega \bigl(\min \lbrace K\underline{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$, where $0 \leq \underline{p} \leq \bar{p} < 1$. Together, these results fully characterize how regret truly scales with $K$, thereby closing the theoretical gap for contextual cascading bandits. Finally, comprehensive experiments validate our theoretical results and show the effectiveness of our proposed method.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 16781
Loading