Optimal Order Simple Regret for Gaussian Process BanditsDownload PDF

21 May 2021, 20:43 (modified: 25 Oct 2021, 19:04)NeurIPS 2021 PosterReaders: Everyone
Keywords: Gaussian Process Bandit, Confidence Intervals, RKHS, Optimal Order Simple Regret
TL;DR: We prove order optimal simple regret for the Gaussian process bandit problem when the objective function is in a reproducing kernel Hilbert space (RKHS).
Abstract: Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When $N$ is the number of exploration trials and $\gamma_N$ is the maximal information gain, we prove an $\tilde{\mathcal{O}}(\sqrt{\gamma_N/N})$ bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Code: zip
13 Replies