GOLLuM: Gaussian Process Optimized LLMs — Reframing LLM Finetuning through Bayesian Optimization

Bojana Ranković; Philippe Schwaller

GOLLuM: Gaussian Process Optimized LLMs — Reframing LLM Finetuning through Bayesian Optimization

Bojana Ranković, Philippe Schwaller

Published: 06 Mar 2025, Last Modified: 15 Apr 2025ICLR 2025 Workshop World ModelsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM finetuning, Bayesian optimization, Chemical optimization, Gaussian processes, Deep Metric Learning

TL;DR: We finetune LLMs through Gaussian process marginal likelihood implicitly uncovering deep metric learning effects and resulting in better sampling over the design space.

Abstract: Large Language Models (LLMs) can encode complex relationships in their latent spaces, yet harnessing them for optimization under uncertainty remains challenging. We address this gap with a novel architecture that reframes LLM finetuning as Gaussian process (GP) marginal likelihood optimization via deep kernel methods. We introduce LLM-based deep kernels, jointly optimized with GPs to preserve the benefits of both — LLMs to provide a rich and flexible input space for Bayesian optimization and — GPs to model this space with predictive uncertainty for more efficient sampling. Applied to Buchwald-Hartwig reaction optimization, our method nearly doubles the discovery rate of high-performing reactions compared to static LLM embeddings (from 24% to 43% coverage of the top 5% reactions in just 50 optimization iterations). We also observe a 14% improvement over domain-specific representations without requiring specialized features. Extensive empirical evaluation across 19 benchmarks — ranging from general chemistry to reaction and molecular property optimization — demonstrates our method's robustness, generality, and consistent improvements across: (1) tasks, (2) LLM architectures (encoder, decoder, encoder-decoder), (3) pretraining domains (chemistry-related or general-purpose) and (4) hyperparameter settings (tuned once on a single dataset). Finally, we explain these improvements: joint LLM-GP optimization through marginal likelihood implicitly performs contrastive learning, aligning representations to produce (1) better-structured embedding spaces, (2) improved uncertainty calibration, and (3) more efficient sampling — without requiring any external loss. This work provides both practical advances in sample-efficient optimization and insights into what makes effective Bayesian optimization.

Submission Number: 86

Loading