Interconnected Neural Linear Contextual Bandits with UCB Exploration

Yang Chen, Miao Xie, Jiamou Liu, Kaiqi Zhao

2022 (modified: 25 Apr 2023)PAKDD (1) 2022Readers: Everyone

Abstract: Contextual multi-armed bandit algorithms are widely used to solve online decision-making problems. However, traditional methods assume linear rewards and low dimensional contextual information, leading to high regrets and low online efficiency in real-world applications. In this paper, we propose a novel framework called interconnected neural-linear UCB (InlUCB) that interleaves two learning processes: an offline representation learning part, to convert the original contextual information to low-dimensional latent features via non-linear transformation, and an online exploration part, to update a linear layer using upper confidence bound (UCB). These two processes produce an effective and efficient strategy for online decision-making problems with non-linear rewards and high dimensional contexts. We derive a general expression of the finite-time cumulative regret bound of InlUCB. We also give a tighter regret bound under certain assumptions on neural networks. We test InlUCB against state-of-the-art bandit methods on synthetic and real-world datasets with non-linear rewards and high dimensional contexts. Results demonstrate that InlUCB significantly improves the performance on cumulative regrets and online efficiency.

0 Replies