Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks

Angelica Chen; Samuel Don Stanton; Frances Ding; Robert G Alberstein; Andrew Martin Watkins; Richard Bonneau; Vladimir Gligorijevic; Kyunghyun Cho; Nathan C. Frey

Generalists vs. Specialists: Evaluating LLMs on Highly-Constrained Biophysical Sequence Optimization Tasks

Angelica Chen, Samuel Don Stanton, Frances Ding, Robert G Alberstein, Andrew Martin Watkins, Richard Bonneau, Vladimir Gligorijevic, Kyunghyun Cho, Nathan C. Frey

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Although large language models (LLMs) have shown promise in biomolecule optimization problems, they incur heavy computational costs and struggle to satisfy precise constraints. On the other hand, specialized solvers like LaMBO-2 offer efficiency and fine-grained control but require more domain expertise. Comparing these approaches is challenging due to expensive laboratory validation and inadequate synthetic benchmarks. We address this by introducing Ehrlich functions, a synthetic test suite that captures the geometric structure of biophysical sequence optimization problems. With prompting alone, off-the-shelf LLMs struggle to optimize Ehrlich functions. In response, we propose LLOME (Language Model Optimization with Margin Expectation), a bilevel optimization routine for online black-box optimization. When combined with a novel preference learning loss, we find LLOME can not only learn to solve some Ehrlich functions, but can even perform as well as or better than LaMBO-2 on moderately difficult Ehrlich variants. However, LLMs also exhibit some likelihood-reward miscalibration and struggle without explicit rewards. Our results indicate LLMs can occasionally provide significant benefits, but specialized solvers are still competitive and incur less overhead.

Lay Summary: We explore the application of large language models (LLMs) to biomolecule design, a task traditionally tackled by specialized solvers. While LLMs demonstrate potential on these tasks, their computational cost and difficulty in meeting precise constraints pose challenges. To address the lack of suitable benchmarks for comparing LLMs and specialized tools, we introduce Ehrlich functions, a synthetic test suite that reflects the complexities of real-world biophysical sequence optimization problems but avoids the costs of wet lab experiments. Our initial results show that LLMs struggle at solving these functions through prompting alone. To address this, we propose LLOME, a novel LLM-based optimization algorithm, and demonstrate it can perform competitively with, or even outperform, the specialized tool LaMBO-2 on moderately challenging problems. Despite this, LLMs exhibit some inconsistencies and require high-quality data. Our findings suggest that LLMs can offer advantages in both effectiveness and usability, but specialized solvers remain a strong alternative, especially when considering efficiency.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/prescient-design/llome

Primary Area: Deep Learning->Large Language Models

Keywords: biophysical optimization, large language models, discrete sequence optimization

Submission Number: 12752

Loading