TopoDiff: Improving Protein Backbone Generation with Topology-aware Latent Encoding

Published: 27 Oct 2023, Last Modified: 29 Nov 2023GenBio@NeurIPS2023 SpotlightEveryoneRevisionsBibTeX
Keywords: diffusion model, protein design, geometric deep learning, representation learning
Abstract: The $\textit{de novo}$ design of protein structures is an intriguing research topic in the field of protein engineering. Recent breakthroughs in diffusion-based generative models have demonstrated substantial promise in tackling this task, notably in the generation of diverse and realistic protein structures. While existing models predominantly focus on unconditional generation or fine-grained conditioning at the residue level, the holistic, top-down approaches to control the overall topological arrangements are still insufficiently explored. In response, we introduce TopoDiff, a diffusion-based framework augmented by a global-structure encoding module, which is capable of unsupervisedly learning a compact latent representation of natural protein topologies with interpretable characteristics and simultaneously harnessing this learned information for controllable protein structure generation. We also propose a novel metric specifically designed to assess the coverage of sampled proteins with respect to the natural protein space. In comparative analyses with existing models, our generative model not only demonstrates comparable performance on established metrics but also exhibits better coverage across the recognized topology landscape. In summary, TopoDiff emerges as a novel solution towards enhancing the controllability and comprehensiveness of $\textit{de novo}$ protein structure generation, presenting new possibilities for innovative applications in protein engineering and beyond.
Supplementary Materials: zip
Submission Number: 60
Loading