Co-Generative De Novo Functional Protein Design

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose CodeFP, a model that integrates functional constraints with simultaneous sequence-structure decoding to generate proteins that satisfy both functionality and foldability
Abstract: *De novo* functional protein design aims to generate protein sequences that realize specified biochemical functions without relying on evolutionary templates, enabling broad applications in biotechnology and medicine. Existing approaches adopt either direct function-to-sequence mapping or decoupled structure-sequence generation strategies but often fail to achieve functionality and foldability simultaneously. To address this, we propose **CodeFP**, a **Co**-generative protein language model for ***de** novo* **F**unctional **P**rotein design that simultaneously decodes sequence and structure tokens, thereby enabling superior simultaneous realization of functionality and foldability. CodeFP utilizes functional local structures to enrich functional semantic encodings, overcoming the suboptimal translation of flat encodings into structure tokens, while introducing auxiliary functional supervision to alleviate training ambiguity stemming from the one-to-many structure-to-token mapping. Extensive experiments show that CodeFP consistently achieves average improvements of 6.1\% in functional consistency and 3.2\% in foldability over the strongest baseline.
Lay Summary: Scientists are trying to use AI to "customize" new proteins with specific functions, such as creating new drugs or plastic-degrading enzymes. However, this is incredibly difficult. Current AI tools either only generate a chemical sequence like writing text (which often fails to fold into a real 3D shape) or design a 3D shape first and force a sequence into it (which often leads to a mismatch and failure). To break this bottleneck, we developed an AI model called CodeFP. Like a sculptor working with both hands, it builds the protein's chemical sequence and its 3D spatial structure at the same time. Furthermore, we introduced the concept of "functional building blocks" to the AI, teaching it to translate abstract biological tasks directly into specific physical shapes. By using this approach, the proteins generated by CodeFP not only fold stably in the physical world but also perform their intended biological functions much more accurately than previous models. This breakthrough will make it more efficient and reliable to customize brand-new proteins for medicine and bioengineering in the future.
Link To Code: https://github.com/PharMolix/OpenBioMed
Primary Area: Applications->Health / Medicine
Keywords: function, protein design, co-generation
Originally Submitted PDF: pdf
Submission Number: 2252
Loading