$\alpha$-PFN: Fast Entropy Search via In-Context Learning

ICLR 2026 Conference Submission16620 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: prior fitted network, Bayesian Optimization, entropy search, transformer, metalearning, information-theoretic acquisition functions, in-context learning
TL;DR: We use the framework of Prior-data Fitted Networks (PFNs) to develop the α-PFN transformer, that learns to approximate the entropy search acquisition function in a single forward pass for fast Bayesian Optimization.
Abstract: Information‐theoretic acquisition functions such as Entropy Search (ES) offer a principled exploration–exploitation framework for Bayesian optimization (BO). However, their practical implementation relies on complicated and slow approximations, i.e., a Monte Carlo estimation of the information gain. This complexity can introduce numerical errors and requires specialized, hand-crafted implementations. We propose a two‐stage amortization strategy that learns to approximate entropy search-based acquisition functions using Prior‐data Fitted Networks (PFNs) in a single forward pass. A first PFN is trained to be conditioned on information about the optima; second, the $\alpha$‐PFN is trained to predict the expected information gain by training on information gains measured with the first PFN. The $\alpha$-PFN offers a scalable and learnable approximation, which replaces the complex approximations with a single forward pass per candidate, enabling rapid and extensible acquisition evaluation. Empirically, our approach is competitive with state‐of‐the‐art entropy search implementations on synthetic and real‐world benchmarks while accelerating the different entropy search variants by over at least a factor of 12x, with the largest speed ups around 30x for the highest 8 dimensional problems.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 16620
Loading