Learning to Predict Ensembles of Protein Conformations from Molecular Dynamics Simulation Trajectories

Bongjin Koo; Patrick Jiang; Soumya Dutta; I. Can Kazan; S. Banu Ozkan; Paul T Kim; Abhishek Singharoy; Tristan Bepler

Learning to Predict Ensembles of Protein Conformations from Molecular Dynamics Simulation Trajectories

Bongjin Koo, Patrick Jiang, Soumya Dutta, I. Can Kazan, S. Banu Ozkan, Paul T Kim, Abhishek Singharoy, Tristan Bepler

Published: 06 Mar 2025, Last Modified: 21 Jul 2025ICLR 2025 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0

Track: Tiny Paper Track

Keywords: proteins, protein structure determination, protein ensembles, protein ensemble prediction

TL;DR: This paper investigates the capability of AlphaFlow model for learning to predict the conformation ensembles from MD trajectories.

Abstract: A group of heterogeneous conformations of a protein, also known as an ensemble of conformations, is a key to understanding protein functions. This is because many proteins are mechanical machines that perform tasks by changing their shapes. Nevertheless, the main focus of protein structure prediction from a sequence thus far has been to accurately predict a single structure, e.g., AlphaFold (AF) [Abramson et al. (2024)] and ESMFold [Lin et al. (2023)]. Recently, works on predicting multiple conformations by subsampling MSAs (multiple sequence alignments) [del Alamo et al. (2022)] or by clustering MSAs [Wayment-Steele et al. (2024)] were introduced. While they can predict heterogeneous conformations, they are limited w.r.t. the diversity of predicted struc- tures as well as the trainability on data other than Protein Data Bank (PDB) [Berman et al. (2000)] structures, such as on molecular dynamics (MD) simulation trajectories. AlphaFlow [Jing et al. (2024)] overcame this limitation by incorporating a Flow Matching (FM) [Lipman et al. (2023)] framework with AlphaFold as a denoising model. Since an FM model can generate diverse samples by transforming the initial samples from a prior distribution, AlphaFlow has a potential to generate ensembles of conformations. The authors showed that it can be trained on MD trajectories and gen- erate physically feasible ensembles. In this paper, we look more closely into AlphaFlow’s ability on learning MD ensembles that are generated using Temperature Replica Exchange Molecular Dynam- ics (T-REMD) [Qi et al. (2018)]. This is an exploratory study before improving its architecture for proposing our own model.

Attendance: Tristan Bepler

Submission Number: 46

Loading