Keywords: guided sampling, flow matching, protein design
Abstract: Deep generative models have achieved substantial success in protein design. A prevalent approach for de novo protein design involves initially designing a protein backbone structure using deep generative models, such as diffusion and flow models, followed by using a separate inverse folding model to design the correponding sequence. Recently, co-design methods, which aim to jointly generate the structure and sequence of a protein, have attracted considerable attention. Despite this, co-designing sequences and structures of long proteins remains challenging. The complexity of this high-dimensional multimodal generative modeling makes sampling of diffusion and flow models prone to accumulated errors, often leading to non-designable regions. To tackle this challenge, we introduce a contrastive guided sampling algorithm with dual multimodal flows to sample both sequences and structures of highly designable proteins. The contrastive guidance uses the lower-quality flow to help the higher-quality flow avoid non-designable regions by gently steering it during sampling. Our method achieves designability of 80% for length-400 proteins and 37% for length-500 proteins, significantly outperforming previous approaches.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9999
Loading