Transcending Bayesian Inference: Transformers Extrapolate Rules Compositionally Under Model Misspecification

Szilvia Ujváry; Anna Mészáros; Wieland Brendel; Patrik Reizinger; Ferenc Huszár

Transcending Bayesian Inference: Transformers Extrapolate Rules Compositionally Under Model Misspecification

Szilvia Ujváry, Anna Mészáros, Wieland Brendel, Patrik Reizinger, Ferenc Huszár

Published: 19 Mar 2025, Last Modified: 25 Apr 2025AABI 2025 Workshop TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language models, OOD generalization, implicit Bayesian inference, compositional generalization

TL;DR: We demonstrate experimentally that Transformers pre-trained for implicit Bayesian inference can often transcend this behaviour in OOD settings, especially in compositional tasks.

Abstract: LLMs' intelligent behaviour, such as emergent reasoning and in-context learning abilities have been interpreted as implicit Bayesian inference (IBI). IBI considers the training data as a mixture, infers the underlying latent parameters thereof, and makes predictions on the training data consistent with explicit Bayesian inference. When the test prompts are out-of-distribution, Bayesian inference over the training mixture components becomes suboptimal due to model misspecification. We pre-train Transformer models for implicit Bayesian inference, and investigate whether they can transcend this behaviour under model misspecification. Our experiments demonstrate that Transformers generalize compositionally, even when the Bayesian posterior is undefined. We hypothesize this behavior is due to Transformers learning general algorithms instead of only fitting the training mixture.

Submission Number: 10

Loading