Large Language Models as Generative Bayesian Policies

Bojana Ranković; Philippe Schwaller

Large Language Models as Generative Bayesian Policies

Bojana Ranković, Philippe Schwaller

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: Bayesian optimization, large language models, preference optimization, Gaussian processes, deep kernel learning, active learning, scientific discovery, molecular design, reaction optimization, peptide design

TL;DR: We train an LLM jointly with a GP, then use its acquisition function to finetune the LLM toward good candidates.

Abstract: Bayesian optimization is the dominant framework for sample-efficient scientific discovery, providing principled uncertainty estimates that guide expensive experiments. However, its reliance on hand-crafted, domain-specific representations limits broad applicability. Large language models (LLMs) offer flexibility as many experimental setups can be described in text, but they lack the calibrated uncertainty required in high-stakes scientific applications. Here we introduce a framework that integrates language models and Bayesian optimization at both the representation and the generation level. We train a Gaussian process critic with deep kernel learning, using the LLM as the feature extractor to produce task-adaptive, uncertainty-calibrated representations. The same critic's acquisition function then provides a preference signal that trains the LLM to act as a generative Bayesian policy: candidates favored by the acquisition function become "preferred" completions for preference-based finetuning. The actor learns to generate high-utility candidates directly, thereby amortizing the acquisition search of standard Bayesian optimization into the language model itself. We evaluate on three scientific optimization benchmarks spanning process chemistry, peptide design, and molecular design. The actor's generation distribution shifts progressively toward higher-utility regions across iterations, and the trained model outperforms feature-based Bayesian optimization, LLM-guided optimization, and their union at different levels of integration. These results suggest that calibrated uncertainty can become the training signal that aligns language models with the decision-making demands of experimental discovery.

Submission Number: 303

Loading