Keywords: proteins, protein structure, structural biology, Boltzmann distribution
Abstract: Recent breakthroughs in protein structure prediction have pointed to structural ensembles as the next frontier in the computational understanding of protein structure. At the same time, iterative refinement techniques such as diffusion have driven significant advancements in generative modeling. We explore the synergy of these developments by combining AlphaFold and ESMFold with flow matching, a powerful modern generative modeling framework, in order to sample the conformational landscape of proteins. When trained on the PDB and evaluated on proteins with multiple recent structures, our method produces ensembles with similar precision and greater diversity compared to MSA subsampling. When further fine-tuned on coarse-grained molecular dynamics trajectories, our model generalizes to unseen proteins and accurately predicts conformational flexbility, captures the joint distribution of atomic positions, and models higher-order physiochemical properties such as intermittent contacts and solvent exposure. These results open exciting avenues in the computational prediction of conformational flexibility.
Submission Track: Original Research
Submission Number: 78
Loading