Keywords: bayesian optimization, foundation model, experimental design, lab-in-the-loop
TL;DR: Foundation-model embeddings with neural surrogates and Thompson sampling give a unified Bayesian optimization recipe across proteins, DNA, RNA, and small molecules, beating GPs, in-context LLMs, and steered generative models on 61 benchmarks.
Abstract: Practical Bayesian Optimization for Scientific Discovery
Hamza Tahir Chaudhry, Sean H. Murphy, Umesh Padia, Cengiz Pehlevan, James Harrison, George Church, Jasper Snoek
07 May 2026 (modified: 07 May 2026)
ICML 2026 Workshop AI4Science Submission
AI4Science, Area Chairs, Reviewers, Authors
Revisions
CC BY 4.0
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: bayesian optimization, foundation model, experimental design, lab-in-the-loop
TL;DR: Foundation-model embeddings with neural surrogates and Thompson sampling give a unified Bayesian optimization recipe across proteins, DNA, RNA, and small molecules, beating GPs, in-context LLMs, and steered generative models on 61 benchmarks.
Abstract:
Bayesian optimization (BO) is a standard tool for experimental scientific discovery, where evaluations are costly and candidate spaces are vast. Classical formulations often rely on methods that scale poorly with data size and are ill-suited to discrete sequences and molecules. Scientific foundation models now provide rich, transferable representations for these domains. However, it remains unclear how to best leverage them with BO in lab-in-the-loop campaigns, or how this approach compares with LLMs and generative modeling, two leading paradigms in AI-for-science. We investigate these questions across proteins, DNA, RNA, and small molecules through 61 regression tasks drawn from established experimental benchmarks. This constitutes, to our knowledge, the most extensive cross-domain study of foundation-model driven Bayesian optimization to date. We systematically ablate foundation models, surrogates, acquisition functions, and fine-tuning regimes under both sequential and batched selection. We find that Gaussian process surrogates are consistently outperformed by neural alternatives paired with Thompson sampling, particularly MLP ensembles and variational Bayesian last layers. We further find that smaller batch sizes reach peak performance faster and recover more elite candidates under the same total experimental budget, a trend that is consistent across tasks and surrogate choices. Finally, we show that foundation-model driven BO outperforms both an in-context LLM surrogate and a guided discrete diffusion model.
Submission Number: 161
Loading