Keywords: Bayesian optimization, Gaussian process, Acquisition function, Hyperparameter uncertainty, Offline-to-online, Sample efficiency
TL;DR: BO is inefficienty with both too few and too many random initial points, except under Thompson Sampling.
Abstract: Bayesian Optimization (BO) pipelines generally begin with an initialization phase: a batch of $n_0$ uninformed evaluations. The choice of $n_0$ is generally guided by rules of thumb, and we empirically observe that the total cost (random initial points plus BO iterations needed to find the global optimum) is U-shaped in $n_0$, i.e., a practitioner will waste resources by selecting either too low or too high a value of $n_0$. We find this tradeoff persists across MLE, Bayesian MCMC, and exact GP hyperparameters, as well as across acquisition functions. Toward the latter, Thompson Sampling appears the exception, with both total cost and simple regret essentially $n_0$-agnostic, though at a uniformly higher level. We attribute this U-shape to the known boundary issue of variance-driven BO: BO burns early budget on corners of the hypercube before turning inward. We demonstrate this effect using a 3D BO trajectory where the exact hyperparameters are known. We conclude with some practical recommendations: using Thompson Sampling for $n_0$-insensitive deployments, and taking a generously large $n_0$ to improve downstream BO performance.
Submission Number: 87
Loading