Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Bayesian optimization, LLM-guided optimization, variable selection, sustainable proteins, plant-based food formulation, high-dimensional optimization, sparse optimization, closed-loop scientific discovery
TL;DR: Expert-Guided BO: an LLM or human picks a low-dim subspace; BO runs inside it. Outperforms baselines on FormulateBench (24 plant-protein tasks). Deployed with a food company, EGBO outperformed a professional food scientist on plant-based yogurt.
Abstract: Autonomous scientific discovery systems increasingly use LLMs to narrow design spaces before experiments are run, but this practice is double-edged: when the LLM is right, sample efficiency can improve dramatically; when it is wrong, the system can underperform random search. We formalize Expert-Guided Bayesian Optimization (EGBO), in which an expert, e.g. a human or LLM, selects a low-dimensional subspace for BO and may adaptively expand it over time.
We decompose EGBO's suboptimality into a selection gap and an optimization gap, and characterize the coverage–dimension tradeoff governing when expert guidance helps.
To support in silico prototyping before costly real-world deployment, we introduce FormulateBench, a suite of 24 plant-based formulation tasks, on which LLM-guided EGBO outperforms all tested baselines. When deployed to optimize two plant-based dairy products, EGBO improves utility, as assessed by a trained human panel, by 29\% and 26\% in 10 iterations each. In a comparison with a professional human food scientist given the same time budget, EGBO achieved near-perfect utility of 0.992, vs. 0.850 for the food scientist.
Submission Number: 121
Loading