SE(3)-Stochastic Flow Matching for Protein Backbone Generation

Published: 16 Jan 2024, Last Modified: 19 Mar 2024ICLR 2024 spotlightEveryoneRevisionsBibTeX
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Proteins; Equivariance; Riemannian; Flow Matching; Generative models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on
TL;DR: A Flow Matching for the de novo design of protein backbones
Abstract: The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce \foldflow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3\mathrm{D}$ rigid motions---i.e. the group $\mathrm{SE(3)}$---enabling accurate modeling of protein backbones. We first introduce $\text{FoldFlow-Base}$, a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on $\mathrm{SE(3)}$. We next accelerate training by incorporating Riemannian optimal transport to create $\text{FoldFlow-OT}$, leading to the construction of both more simple and stable flows. Finally, we design \foldflowsfm, coupling both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over $\mathrm{SE(3)}$. Our family of $\text{FoldFlow}$, generative models offers several key advantages over previous approaches to the generative modeling of proteins: they are more stable and faster to train than diffusion-based approaches, and our models enjoy the ability to map any invariant source distribution to any invariant target distribution over $\mathrm{SE(3)}$. Empirically, we validate $\text{FoldFlow}$, on protein backbone generation of up to $300$ amino acids leading to high-quality designable, diverse, and novel samples.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Primary Area: generative models
Submission Number: 3838