Keywords: Protein foundation models; Molecular Dynamics; Multimodal learning; Protein representation learning
TL;DR: Adding Molecular Dynamics trajectories as a modality to protein foundation models improves performance on dynamics-sensitive downstream tasks, especially when structural information is limited.
Abstract: Proteins are dynamic molecules whose function depends not only on sequence and structure but also on conformational changes over time. We investigate how Molecular Dynamics (MD) trajectories can be integrated as an additional modality in protein foundation models by extending OneProt with all-atom, time-resolved MD data curated from mdCATH, GPCRmd, and ATLAS databases. These trajectories encode protein flexibility, conformational variability, and thermodynamic sensitivity, complementing static sequence-, structure-, and text-based representations. Using a pre-trained transformer-based MDGen encoder, we perform systematic pre-training ablations and evaluate the resulting representations across diverse protein prediction tasks. We find that incorporating MD trajectories consistently improves performance on downstream tasks sensitive to dynamics or structural context, particularly when explicit structural information is limited. Our results demonstrate that transformer-based MD encoders capture biologically meaningful dynamic signals that enhance protein foundation models, highlighting the value of integrating protein dynamics for potential applications such as protein engineering and drug discovery.
Submission Number: 96
Loading