Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

ICLR 2024 Workshop ME-FoMo Submission87 Authors

Published: 04 Mar 2024, Last Modified: 05 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context Learning, Pre-training, Bayesian inference

TL;DR: Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Abstract: In-context learning (ICL) has emerged as a powerful learning paradigm. Going back to De Finetti’s work on Bayesian inference using observables—as opposed to priors on latent factors/parameters—we establish an \emph{explicit} equivalence between ICL and Bayesian inference \emph{a la} De Finetti. From this view, pre-training is precisely empirical Bayes: it optimizes the marginal likelihood of observed sequences; compared to fitting priors in conventional empirical Bayes, pre-training fits posterior predictives using transformers. Our observation highlights previously under-explored capabilities of ICL: statistical inference and uncertainty quantification. Our theory highlights the importance of predictive coherence and motivates a new regularizer for pre-training sequence models to be logically coherent Bayesians statisticians. Our preliminary empirical results demonstrate coherency regularization can substantially improve the inferential capabilities of ICL.

Submission Number: 87

Loading