Learning millisecond protein dynamics from what is missing in NMR spectra

Gina El Nesr, Hannah Wayment-Steele, Sergey Ovchinnikov, Ramith Hettiarachchi, Hasindu Kariyawasam

Published: 25 Mar 2025, Last Modified: 04 May 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Many proteins’ biological functions rely on interconversions between multiple conformations occurring at micro- to millisecond (µs-ms) timescales. A lack of standardized, large-scale experimental data has hindered obtaining a more predictive understanding of these motions. After curating >100 Nuclear Magnetic Resonance (NMR) relaxation datasets, we realized an observable for µs-ms dynamics might be hiding in plain sight. Millisecond dynamics can cause NMR signals to broaden beyond detection, leaving some residues not assigned in the chemical shift datasets of ∼10,000 proteins deposited in the Biological Magnetic Resonance Data Bank (BMRB)1. We made the bold assumption that residues missing assignments are exchange-broadened due to µs-ms motions and trained various deep learning models to predict missing assignments. Strikingly, these models also predict exchange measured via NMR relaxation experiments, indicative of µs-ms dynamics. The best of these models, which we named Dyna-1, leverages an intermediate layer of the multimodal language model ESM-32. Notably, dynamics directly linked to biological function — including enzyme catalysis and ligand binding — are particularly well predicted by Dyna-1, which parallels our findings that residues experiencing µs-ms exchange are more conserved. We anticipate the datasets and models presented here will be transformative in unlocking the common language of dynamics and function.