α-PFN: In-Context Learning Entropy Search

Tom Julian Viering; Steven Adriaensen; Herilalaina Rakotoarison; Samuel Müller; Carl Hvarfner; Frank Hutter; Eytan Bakshy

α-PFN: In-Context Learning Entropy Search

Tom Julian Viering, Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Carl Hvarfner, Frank Hutter, Eytan Bakshy

Published: 06 Mar 2025, Last Modified: 24 Apr 2025FPI-ICLR2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: prior fitted network, Bayesian Optimization, entropy search, transformer, metalearning, information-theoretic acquisition functions, in-context learning

TL;DR: We use the framework of Prior-data Fitted Networks (PFNs) to develop the α-PFN transformer, that learns to approximate the entropy search acquisition function in a single forward pass for fast Bayesian Optimization.

Abstract: We show how Prior-data Fitted Networks (PFNs) can be adapted to efficiently predict Entropy Search (ES), an information-theoretic acquisition function. PFNs were previously shown to be able to accurately approximate Gaussian Process (GP) predictions. To approximate ES we extend them to condition on information about the optimum of the underlying function. Conditioning on this information is not straightforward and previous methods relied on complex, handcrafted, and/or computationally heavy approximations. PFNs, however, offer learned approximations that require just a single forward pass. Additionally, we train $\alpha$-PFN, a new type of PFN model, on the information gains predicted by the first, letting us directly predict the value of the acquisition function in a single forward pass, effectively avoiding the traditional sampling-based approximations. This approach makes using Entropy Search and its variations straightforward and efficient in practice. We validate our approach empirically on synthetic GP samples of up to six dimensions, where the $\alpha$-PFN matches or improves upon the regrets obtained by current approximations to predictive and joint Entropy Search, at a reduced computational cost. While this provides an initial proof of concept, the real potential of our method lies in its ability to efficiently perform Entropy Search for arbitrary function priors, unlike the current GP-specific approximations.

Submission Number: 77

Loading