PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose PPDiff, a diffusion model building upon our developed SSINC for protein-protein complex sequence and structure co-design.
Abstract: Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing, remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiff builds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, $k$-nearest neighbor ($k$NN) equivariant graph convolutional layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein-protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBench and finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiff consistently surpasses baseline methods, achieving success rates of 50.00\%, 23.16\%, and 16.89\% for the pretraining task and the two downstream applications, respectively.
Lay Summary: How can we automatically design high-affinity protein binders for arbitrary protein targets? We present PPDiff, a novel generative framework based on diffusion models, for the design of protein-binding proteins with high affinity. PPDiff operates in a hybrid sequence–structure space, enabling the simultaneous generation of both binder sequences and their corresponding backbone structures for a given protein target. This joint modeling approach allows PPDiff to effectively capture the complex interplay between sequence, structure, and binding specificity in protein–protein interactions. To support research in this area, we create PPBench, a curated dataset of protein–protein complexes designed for benchmarking binder design tasks. PPDiff achieves high success rates on PPBench, as well as two additional challenging tasks: target protein–mini binder complex design and antigen–antibody complex design. Furthermore, our model demonstrates strong generalization ability, producing diverse and novel binders with high affinities across a broad range of protein targets.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: protein complex design, protein-binding protein design, protein complex sequence and structure co-design
Submission Number: 4094
Loading