Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
Abstract: We study the Gaussian process (GP) bandit problem, whose goal is to minimize regret under an unknown reward function lying in some reproducing kernel Hilbert space (RKHS).
The maximum posterior variance analysis is vital in analyzing near-optimal GP bandit algorithms such as maximum variance reduction (MVR) and phased elimination (PE).
Therefore, we first show the new upper bound of the maximum posterior variance, which improves the dependence of the noise variance parameters of the GP. By leveraging this result, we refine the MVR and PE to obtain (i) a nearly optimal regret upper bound in the noiseless setting and (ii) regret upper bounds that are optimal with respect to the RKHS norm of the reward function. Furthermore, as another application of our proposed bound, we analyze the GP bandit under the time-varying noise variance setting, which is the kernelized extension of the linear bandit with heteroscedastic noise. For this problem, we show that MVR and PE-based algorithms achieve noise variance-dependent regret upper bounds, which matches our regret lower bound.
Lay Summary: In many decision-making situations, we need to learn which options work best by trying them and observing the results.
This process is tricky because we often do not know how our choices lead to desirable rewards, and the results can be noisy or unclear, especially when the conditions change over time. Many existing methods struggle to deal with this kind of uncertainty.
Our research improves the way we analyze decision-making strategies in such uncertain environments.
We developed a new way to better measure how unsure a system is about its predictions.
Using this, we updated the existing theory of two important learning methods to make them more effective, especially when there is little noise or when the amount of noise changes over time.
These improvements help us understand how learning systems behave in challenging situations.
We expect that our result will lead to better tools for real-world decision-making applications.
Primary Area: Theory->Online Learning and Bandits
Keywords: Gaussian process bandits, kernel bandits, noiseless setting
Submission Number: 2194
Loading