LaPep: Can Language Contribute to Property-Guided Peptide Design?

Kimberly Liang; Tong Chen; Pranam Chatterjee

LaPep: Can Language Contribute to Property-Guided Peptide Design?

Kimberly Liang, Tong Chen, Pranam Chatterjee

Published: 08 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: peptide design, language-guided generation, property constraints, discrete flow matching

TL;DR: LaPep evaluates whether language models can meaningfully guide therapeutic peptide generation under hard property constraints and shows they often cannot without reliable predictors.

Abstract: Large language models (LLMs) encode broad chemical heuristics from the scientific literature and are increasingly proposed as tools for therapeutic molecule design. However, their effectiveness for generating therapeutically viable peptides, particularly in the absence of strong labeled predictors, remains unclear. We introduce **LaPep**, a sampling-time framework that integrates LLMs as token-level proposers within a discrete flow-based peptide generator, while using hard property predictors to guide and evaluate generation. Using open-source LLMs including Qwen3, Kimi K2, and Llama 3, we study two representative design settings: permeability, where a strong predictor exists, and protease stability, where it does not. We show that language guidance can improve permeability when combined with a hard predictor, but provides limited or inconsistent gains for protease stability when used alone, despite leveraging external heuristic scorers. These results highlight that current LLMs are not yet reliable substitutes for quantitative property models in therapeutic peptide design. We position LaPep as a strong diagnostic framework for systematically evaluating the capabilities and limitations of language models in guided molecular generation, and argue that high-quality labeled predictors remain critical for translating language-driven design into therapeutically relevant outcomes.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 26

Loading