Expected Improvement-based Contextual BanditsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Linear Bandits, Contextual Bandits, Expected Improvement, Neural Tangent Kernel
Abstract: The expected improvement (EI) is a popular technique to handle the tradeoff between exploration and exploitation under uncertainty. However, compared to other techniques as Upper Confidence Bound (UCB) and Thompson Sampling (TS), the theoretical properties of EI have not been well studied even for non-contextual settings such as standard bandit and Bayesian optimization. In this paper, we introduce and study the EI technique as a new tool for the contextual bandit problem which is a generalization of the standard bandit. We propose two novel EI-based algorithms for this problem, one when the reward function is assumed to be linear and the other when no assumption is made about the reward function other than it being bounded. With a linear reward function, we demonstrate that our algorithm achieves a near-optimal regret. In particular, our regret bound reduces a factor of $\sqrt{\text{log}(T)}$ compared to the popular OFUL algorithm \citep{Abbasi11} which uses the UCB approach, and reduces a factor of $\sqrt{d\text{log}(T)}$ compared to another popular algorithm \citep{agrawal13} which uses the TS approach. Here $T$ is the horizon and $d$ is the feature vector dimension. Further, when no assumptions are made about the form of reward, we use deep neural networks to model the reward function. We prove that this algorithm also achieves a near-optimal regret. Finally, we provide an empirical evaluation of the algorithms on both synthetic functions and various benchmark datasets. Our experiments show that our algorithms work well and consistently outperform existing approaches.
Supplementary Material: zip
13 Replies

Loading