ESMfluc: Predicting Flexible Regions in a Protein Using Language Models

ICLR 2026 Conference Submission22340 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Protein language modeling, BiLSTM, Attention, Molecular Dynamics, Protein flexibility, Evolutionary Scale Modeling
TL;DR: We predict flexible regions in proteins through fine tuning a large language model on molecular dynamics data.
Abstract: Proteins are dynamic molecular machines whose functionality emerges not merely from their static structures but critically from their intrinsic conformational flexibility. Understanding how a protein sequence encodes this flexibility is essential for deciphering the connection between sequence, dynamics, and biological function. While recent advances in deep learning and protein language models have significantly improved structural prediction, predicting sequence-encoded dynamics remains challenging. In this work, we introduce ESMfluc, a biLSTM model trained on molecular dynamics simulation data, utilizing embeddings from the Evolutionary Scale Modeling (ESM) architecture to predict local flexibility directly from protein sequences. Using fluctuation data derived from extensive molecular dynamics simulations, ESMfluc accurately identifies flexible residues without computationally expensive simulations while providing interpretability via attention maps. The model notably highlights distal flexible regions relevant for allosteric regulation and drug targeting. Our approach demonstrates substantial improvements over traditional flexibility proxies, offering researchers a computationally efficient method to reveal critical functional sites beyond active or binding regions.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22340
Loading