Structure-Aligned Protein Language Model

Can Chen; David Heurtel-Depeiges; Robert McCoy Vernon; Christopher James Langmead; Yoshua Bengio; Quentin Fournier

Structure-Aligned Protein Language Model

Can Chen, David Heurtel-Depeiges, Robert McCoy Vernon, Christopher James Langmead, Yoshua Bengio, Quentin Fournier

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Protein Language Models, Protein Structure

TL;DR: We propose a post-training dual-task framework that integrates structural knowledge into pLMs resulting in performance gains across a wide range of downstream tasks.

Abstract: Protein language models (pLMs) pre-trained on vast protein sequence databases excel at various downstream tasks but lack the structural knowledge essential for many biological applications. To address this, we integrate structural insights from pre-trained protein graph neural networks (pGNNs) into pLMs through a latent-level contrastive learning task. This task aligns residue representations from pLMs with those from pGNNs across multiple proteins, enriching pLMs with inter-protein structural knowledge. Additionally, we incorporate a physical-level task that infuses intra-protein structural knowledge by optimizing pLMs to predict structural tokens. The proposed \textit{dual-task framework} effectively incorporates both inter-protein and intra-protein structural knowledge into pLMs. Given the variability in the quality of protein structures in PDB, we further introduce a \textit{residue loss selection} module, which uses a small model trained on high-quality structures to select reliable yet challenging residue losses for the pLM to learn. Applying our structure alignment method to the state-of-the-art ESM2 and AMPLIFY results in notable performance gains across a wide range of tasks, including a $12.7\%$ increase in ESM2 contact prediction. The data, code, and resulting SaESM2 and SaAMPLIFY models will be released on Hugging Face.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 7912

Loading