Representing local protein environments with machine learning force fields

Meital Bojan; Sanketh Vedula; Sai Advaith Maddipatla; Nadav Bojan; Anar Rzayev; Federico Napoli; Paul Schanda; Alexander Bronstein

Representing local protein environments with machine learning force fields

Meital Bojan, Sanketh Vedula, Sai Advaith Maddipatla, Nadav Bojan, Anar Rzayev, Federico Napoli, Paul Schanda, Alexander Bronstein

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine learning force fields, structural biology, NMR, representation learning

TL;DR: We show that embeddings from machine learning force fields provide rich, transferable representations of local protein environments, enabling zero-shot generalization and state-of-the-art downstream performance.

Abstract: The local structure of a protein strongly impacts its function and interactions with other molecules. Representing local biomolecular environments remains a key challenge while applying machine learning approaches over protein structures. The structural and chemical variability of these environments makes them challenging to model, and performing representation learning on these objects remains largely under-explored. In this work, we propose representations for local protein environments that leverage intermediate features from machine learning force fields (MLFFs). We extensively benchmark state-of-the-art MLFFs—comparing their performance across latent spaces and downstream tasks—and show that their embeddings capture local structural (e.g., secondary motifs) and chemical features (e.g., amino acid identity and protonation state), organizing protein environments into a structured manifold. We show that these representations enable zero-shot generalization and transfer across diverse downstream tasks. As a case study, we build a physics-informed, uncertainty-aware chemical shift predictor that achieves state-of-the-art accuracy in biomolecular NMR spectroscopy. Our results establish MLFFs as general-purpose, reusable representation learners for protein modeling, opening new directions in representation learning for structured physical systems.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 23683

Loading