Distilled Protein Backbone Generation

Liyang Xie; Haoran Zhang; Zhendong Wang; Wesley Tansey; Mingyuan Zhou

Distilled Protein Backbone Generation

Liyang Xie, Haoran Zhang, Zhendong Wang, Wesley Tansey, Mingyuan Zhou

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: protein backbone generation, de novo protein design, diffusion distillation

TL;DR: Few-step distillation for protein backbone generators achieves high designability while drastically reducing sampling time.

Abstract: Diffusion- and flow-based generative models have recently demonstrated strong performance in protein backbone generation tasks, offering unprecedented capabilities for $\textit{de novo}$ protein design. However, while achieving notable performance in generation quality, these models are limited by their generating speed, often requiring hundreds of iterative steps in the reverse-diffusion process. This computational bottleneck limits their practical utility in large-scale protein discovery, where thousands to millions of candidate structures are needed. To address this challenge, we explore the techniques of score distillation, which has shown great success in reducing the number of sampling steps in the vision domain while maintaining high generation quality. However, a straightforward adaptation of these methods results in unacceptably low designability. Through extensive study, we have identified how to appropriately adapt Score identity Distillation (SiD), a state-of-the-art score distillation strategy, to train few-step protein backbone generators which significantly reduce sampling time, while maintaining comparable performance to their pretrained teacher model. In particular, multistep generation combined with inference time noise modulation is key to the success. We demonstrate that our distilled few-step generators achieve more than a 20-fold improvement in sampling speed, while achieving similar levels of designability, diversity, and novelty as the $Prote\'ina$ teacher model. This reduction in inference cost enables large-scale $\textit{in silico}$ protein design, thereby bringing diffusion-based models closer to real-world protein engineering applications.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 14068

Loading