Abstract: Thompson sampling is a method for Bayesian optimisation whereby a randomly drawn belief of the objective function is sampled at each round and then optimised, informing the next observation point.
The belief is typically maintained using a sufficiently expressive Gaussian process (GP) surrogate of the true objective function.
The sample drawn is non-convex in general and non-trivial to optimise.
Motivated by the desire to make this optimisation subproblem more tractable, we propose difference-of-convex Thompson sampling (DCTS): a scalable method for drawing GP samples that combines random neural network features with pathwise updates on the limiting kernel. The resulting samples belong to the difference-of-convex function class, which are inherently easier to optimise while retaining rich expressive power. We establish sublinear cumulative regret bounds using a simplified proof technique and demonstrate the advantages of our framework on various problems, including synthetic test functions, hyperparameter tuning, and computationally expensive physics simulations.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Roman_Garnett1
Submission Number: 6592
Loading