Regret and Belief Complexity Tradeoff in Gaussian Process Bandits via Information Thresholding

Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

Published: 01 Jan 2025, Last Modified: 13 May 2025IEEE Trans. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Bayesian optimization is a powerful framework for global search, using maximum a posteriori updates instead of simulated annealing. We cast it as a multiarmed bandit problem with a Gaussian process (GP) for the payoff function. Action selections rely on upper confidence bound (UCB) or expected improvement (EI). Prior works with GPs faced challenges for large iteration horizons ($T$) due to cubic scaling in posterior computation. To address this, we propose a simple thresholding: incorporating an action into the GP posterior only when its conditional entropy surpasses $\epsilon$. Doing so permits us to precisely characterize the tradeoff between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $\epsilon$ for both discrete and continuous action sets. To best of our knowledge, this is the first result which allows us to obtain sublinear regret bounds while still maintaining sublinear growth rate of the complexity of the posterior which is linear in the existing literature. Moreover, a provably finite bound on the complexity could be achieved but the algorithm would result in $\epsilon$-regret which means $\textbf{Reg}_{T}/T\rightarrow\mathcal{O}(\epsilon)$ as $T\rightarrow\infty$. Experiments demonstrate state-of-the-art accuracy and complexity tradeoffs for GP bandit algorithms in global optimization, highlighting the benefits of compressed GPs in bandit settings.