Position: The Future of Bayesian Prediction Is Prior-Fitted

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Training neural networks on data sampled from priors is the future of Bayesian prediction.
Abstract: Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a class of methods designed to leverage this insight. In an era of rapidly increasing computational resources for pre-training and a near stagnation in the generation of new real-world data in many applications, PFNs are poised to play a more important role across a wide range of applications. They enable the efficient allocation of pre-training compute to low-data scenarios. Originally applied to small Bayesian modeling tasks, the field of PFNs has significantly expanded to address more complex domains and larger datasets. This position paper argues that PFNs and other amortized inference approaches represent the future of Bayesian inference, leveraging amortized learning to tackle data-scarce problems. We thus believe they are a fruitful area of research. In this position paper, we explore their potential and directions to address their current limitations.
Lay Summary: The way increasing computational resources available for neural network training, e.g., in data centers, are commonly used is by training neural networks for longer. This does require substantial amounts of data at some point, though, as training on the same data too many times does not tend to yield improvements for neural networks. We advocate for a particular method (PFNs) to use the increasing compute resources available to improve performance of neural networks in domains with little data available. The way this is done is that training is not actually performed on the real-world data available but instead on synthetic data that is randomly generated. The neural network is then only conditioned on the real-world data, similar to how ChatGPT is conditioned on your question, and yields predictions on the fly.
Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: PFNs, Bayesian inference, deep learning, transformer
Submission Number: 529
Loading