InstructPro: Natural Language Guided Ligand-Binding Protein Design

ICLR 2026 Conference Submission20004 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: instruction following, natural language guidance, functional protein design
Abstract: Designing ligand-binding proteins with precise functions is fundamental to advances in biology and chemistry, yet existing AI approaches are limited by scarce protein–ligand complex data. Meanwhile, abundant text descriptions of protein–ligand interactions remain underutilized. We introduce InstructPro, a family of generative models that design proteins from natural language instructions and ligand formulas. InstructPro produces protein sequences consistent with specified functional descriptions and ligand targets. To enable training and evaluation, we develop InstructProBench, a large-scale dataset of 9.6 million (function description, ligand, protein) triples. We train two model variants -- InstructPro-1B and InstructPro-3B -- that substantially outperform strong baselines. InstructPro-1B achieves design success rates of 2.46% (seen ligands) and 3.14% (zero-shot), while InstructPro-3B reaches 5.06% and 3.93%, respectively. These results demonstrate the potential of natural language–guided generative modeling to expand protein design capabilities beyond traditional data limitations.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 20004
Loading