Can Transformers Learn Full Bayesian Inference in Context?

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We show that transfomers can effectively solve full Bayesian inference in context.
Abstract: Transformers have emerged as the dominant architecture in the field of deep learning, with a broad range of applications and remarkable in-context learning (ICL) capabilities. While not yet fully understood, ICL has already proved to be an intriguing phenomenon, allowing transformers to learn in context—without requiring further training. In this paper, we further advance the understanding of ICL by demonstrating that transformers can perform full Bayesian inference for commonly used statistical models in context. More specifically, we introduce a general framework that builds on ideas from prior fitted networks and continuous normalizing flows and enables us to infer complex posterior distributions for models such as generalized linear models and latent factor models. Extensive experiments on real-world datasets demonstrate that our ICL approach yields posterior samples that are similar in quality to state-of-the-art MCMC or variational inference methods that do not operate in context. The source code for this paper is available at https://github.com/ArikReuter/ICL_for_Full_Bayesian_Inference
Lay Summary: Large Language Models (LLMs), such as the one behind ChatGPT, have become widely used and commercially successful. A key reason for their success is their ability to perform in-context learning (ICL): given only a few examples or instructions in the input, they can solve complex tasks without needing to change their internal parameters. In this work, we explore whether the abstract principle of ICL—learning directly from context can also be applied to a very different challenge: performing full Bayesian inference, a core task in statistics and machine learning. Traditionally, full Bayesian inference requires either very costly computations or relies on approximations that may compromise accuracy. We show that for three widely used statistical models, an ICL-based approach can achieve results comparable to expensive, exact methods while outperforming commonly used approximations. In summary, our results validate that ICL is a meaningful principle for full Bayesian inference and might therefore become a general and promising approach for solving difficult inference problems in science and engineering.
Link To Code: https://github.com/ArikReuter/ICL_for_Full_Bayesian_Inference
Primary Area: Probabilistic Methods
Keywords: In-Context Learning, Prior-Data Fitted Networks, Bayesian Inference
Submission Number: 599
Loading