DynaPPI: A large-scale dynamic protein dataset for AI-driven advances in protein interactomics

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 2: Dataset Proposal Competition
Keywords: dynamic protein dataset, protein interactomics, molecular dynamics, AI4Science, diffusion models
Abstract: Diffusion models have been widely explored in protein backbone generation due to their powerful generation capabilities. However, in today's AI-driven biological research, predicting the structure of unknown multi-chain protein aggregates (called “complexes” in biology) remains an unsolved challenge. This is because existing static or dynamic protein datasets focus solely on static snapshots or single-entity trajectories, neglecting the dynamic process of multiple monomers forming complexes. To alleviate this dilemma, we present DynaPPI, a dynamic protein dataset comprising molecular dynamics (MD) trajectories of protein complex formation from dissociated chains to the bound state, as a pivotal resource to bridge the gap between static structural biology and the inherently temporal nature of dynamic molecular interactions. Benefiting from this dataset, diffusion models can explicitly learn the dynamic binding trajectories of known complexes and accurately predict the structures of unknown complexes based on their diverse generative properties, thereby further catalyzing AI-driven structural biology and protein interactomics.
Submission Number: 176
Loading