DiffMaSIF: Score-Based Diffusion Models for Protein Surfaces

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: geometric deep learning, protein-protein docking, diffusion model, equivariant network, protein surface
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: DiffMaSIF is a score-based diffusion model which advances rigid protein-protein docking with a novel surface-centric molecular representation
Abstract: Predicting protein-protein complexes is one of the central challenges of computational structural biology. Inspired by recent generative machine learning (ML) techniques which have shown promise in the realms of protein docking, we introduce DiffMaSIF, a novel score-based diffusion model for rigid protein-protein docking. While existing methods rely on co-evolution learned on a residue level, this information is not sufficient for transient, weakly evolved, or newly designed interfaces. DiffMaSIF’s efficacy hinges on its surface-based molecular representation which can capture the complementarity inherent in the physical surfaces of interacting protein interfaces. We follow an end-to-end two-tier prediction schema: initially identifying contact sites on the protein surface and constraining each molecular graph to these sites, followed by an equivariant network to position the two proteins. This data reduction step enables the use of more sophisticated networks and more training steps. In addition to developing this model, we introduce new dataset splits accounting for structural leakage at the interface and thus tailored for benchmarking protein-protein interface prediction performance. Our results demonstrate that DiffMaSIF not only outperforms contemporary ML methods in rigid protein docking, but also matches traditional docking tools at considerably fewer numbers of generated decoys. Through DiffMaSIF, we pave the way for surface-centric interface prediction methods, thus advancing accurate prediction of protein interactions across a wide spectrum of difficult and novelty.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7368
Loading