LaPep: Can Language Contribute to Property-Guided Peptide Design?
Track: long paper (up to 10 pages)
Keywords: peptide design, language-guided generation, property constraints, discrete flow matching
TL;DR: LaPep evaluates whether language models can meaningfully guide therapeutic peptide generation under hard property constraints and shows they often cannot without reliable predictors.
Abstract: Large language models (LLMs) encode broad chemical heuristics from the scientific literature and are increasingly proposed as tools for therapeutic molecule design. However, their effectiveness for generating therapeutically viable peptides, particularly in the absence of strong labeled predictors, remains unclear. We introduce **LaPep**, a sampling-time framework that integrates LLMs as token-level proposers within a discrete flow-based peptide generator, while using hard property predictors to guide and evaluate generation. Using open-source LLMs including Qwen3, Kimi K2, and Llama 3, we study two representative design settings: permeability, where a strong predictor exists, and protease stability, where it does not. We show that language guidance can improve permeability when combined with a hard predictor, but provides limited or inconsistent gains for protease stability when used alone, despite leveraging external heuristic scorers. These results highlight that current LLMs are not yet reliable substitutes for quantitative property models in therapeutic peptide design. We position LaPep as a strong diagnostic framework for systematically evaluating the capabilities and limitations of language models in guided molecular generation, and argue that high-quality labeled predictors remain critical for translating language-driven design into therapeutically relevant outcomes.
Presenter: ~Kimberly_Liang1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 26
Loading