Keywords: Bayesian optimization, Transformers, Synthetic Data, PFNs
TL;DR: Strong Bayesian optimization performance using a neural network meta-learned on synthetic data, sampled from a prior, as surrogate.
Abstract: Bayesian Optimization (BO) is an effective approach to optimize black-box functions, relying on a probabilistic surrogate to model the response surface. In this work, we propose to use a Prior-data Fitted Network (PFN) as a cheap and flexible surrogate. PFNs are neural networks that approximate the Posterior Predictive Distribution (PPD) in a single forward-pass. Most importantly, they can approximate the PPD for any prior distribution that we can sample from efficiently. Additionally, we show what is required for PFNs to be used in a standard BO setting with common acquisition functions. We evaluated the performance of a PFN surrogate for Hyperparameter optimization (HPO), a major application of BO. While the method can still fail for some search spaces, we fare comparable or better than the state-of-the-art on the HPO-B and PD1 benchmark.