Neural Contextual Bandits with Deep Representation and Shallow Exploration

Pan Xu; Zheng Wen; Handong Zhao; Quanquan Gu

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone

Keywords: Deep Learning, Multi-armed Bandits, Contextual Bandits, Computational Efficiency

Abstract: We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms the raw feature vector using the last hidden layer of a deep ReLU neural network (deep representation learning), and uses an upper confidence bound (UCB) approach to explore in the last linear layer (shallow exploration). We prove that under standard assumptions, our proposed algorithm achieves $\tilde{O}(\sqrt{T})$ finite-time regret, where $T$ is the learning time horizon. Compared with existing neural contextual bandit algorithms, our approach is computationally much more efficient since it only needs to explore in the last layer of the deep neural network.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

12 Replies

Loading