Linear Contextual Bandits with Hybrid Payoff: Revisited

Nirjhar Das; Gaurav Sinha

Linear Contextual Bandits with Hybrid Payoff: Revisited

Nirjhar Das, Gaurav Sinha

Published: 01 Jan 2024, Last Modified: 12 May 2025ECML/PKDD (6) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study the Linear Contextual Bandit (LinearCB) problem in the hybrid reward setting. In this setting, every arm’s reward model contains arm specific parameters in addition to parameters shared across the reward models of all the arms. We can easily reduce this setting to two closely related settings; (a) Shared - no arm specific parameters, and (b) Disjoint - only arm specific parameters, enabling the application of two popular state of the art algorithms - LinUCB and DisLinUCB (proposed as Algorithm 1 in Li et al. 2010). When the arm features are stochastic and satisfy a popular diversity condition, we provide new regret analyses for both LinUCB and DisLinUCB that significantly improves upon the known regret guarantees of these algorithms. Our novel analysis critically exploits the structure of the hybrid rewards and diversity of the arm features. Along with proving these new guarantees, we introduce a new algorithm HyLinUCB that crucially modifies LinUCB (using a new exploration coefficient) to account for sparsity in the hybrid setting. Under the same diversity assumptions, we prove that at the end of T rounds, HyLinUCB also incurs only \(\tilde{O}(\sqrt{T})\) regret. We perform extensive experiments on synthetic and real-world datasets demonstrating strong empirical performance of HyLinUCB. When the number of arm specific parameters is much larger than the number of shared parameters, we observe that DisLinUCB incurs the lowest regret. In this case, regret of HyLinUCB is the second best and it is extremely competitive to DisLinUCB. In all other situations, including our real-world dataset, HyLinUCB has significantly lower regret than LinUCB, DisLinUCB and other state of the art baselines we considered. We also empirically observe that the regret of HyLinUCB grows much slower with the number of arms K, compared to baselines, making it suitable even for very large action spaces.

Loading