DynPro: A Large-Scale Dataset of Molecular Dynamics Simulations for Protein Conformational Ensembles and Transitions
Track: Track 2: Dataset Proposal Competition
Keywords: Protein dynamics, Molecular dynamics simulations, Enhanced sampling, Generative AI
Abstract: Protein dynamics underpin critical biological processes, yet existing datasets for AI-driven modeling are limited to short timescales and local fluctuations, failing to capture broad conformational ensembles, transitions, and complex interactions essential for drug design and biomedicine. We propose DynPro, a large-scale, openly shareable dataset comprising enhanced molecular dynamics (MD) simulations for tens of thousands of protein systems. Each system features at least 100 $\mu$s of effective simulation time via adaptive sampling techniques, providing atomistic trajectories, Boltzmann-weighted free energies, and kinetic metadata in mmCIF format.
DynPro enables generative AI to capture long-timescale ensembles and rare transitions, addressing computational and data bottlenecks. Built from public PDB structures with advanced MD simulations on HPC clusters (~\$50-100M), it provides a transformative resource for drug design, disease mechanism studies, and synthetic biology, establishing a new paradigm in AI-driven structural dynamics.
Submission Number: 121
Loading