JUST ADD STRUCTURE: PROTEIN LANGUAGE MODELS COMBINED WITH STRUCTURAL EQUIVARIANCE EXCEL AT PROTEIN TASKS

Qurat ul ain; Yee Whye Teh; Carlos Outeiral; Matteo Cagiada; Charlotte Deane

JUST ADD STRUCTURE: PROTEIN LANGUAGE MODELS COMBINED WITH STRUCTURAL EQUIVARIANCE EXCEL AT PROTEIN TASKS

Qurat ul ain, Yee Whye Teh, Carlos Outeiral, Matteo Cagiada, Charlotte Deane

Published: 02 Mar 2026, Last Modified: 05 Mar 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: sequence, structure, protein, egnn, equivariance, language models

TL;DR: We contend that prioritizing higher- fidelity representations and biology-aware architectures will yield significantly greater dividends in protein modeling than indiscriminate parameter scaling or elaborate adaptation of PLMs.

Abstract: Accurate in silico prediction of protein properties, functional fitness, and muta- tional effects remains a central challenge in protein engineering and therapeutic design. While Protein Language Models (PLMs) successfully capture rich evo- lutionary and functional constraints from sequence data, they only indirectly en- code the spatial and geometric information that fundamentally governs protein function. Consequently, state-of-the-art approaches typically rely on extensive fine-tuning, ensembling, or the incorporation of handcrafted structural features to achieve competitive accuracy, making them computationally expensive and dif- ficult to scale. In this work, we demonstrate that explicit geometric modeling can substitute for, and in most cases outperform, large-scale PLM fine-tuning, with much higher parameter efficiency. Our approach, ProtEGNN, pairs PLM residue representations with a lightweight E(3)-Equivariant Graph Neural Net- work, competing with or achieving state-of-the-art performance across seven dif- ferent benchmarks in protein property, mutational effect and function prediction, while needing 100–1000× fewer parameters than competing approaches. Notably, even when paired with the smallest readily available PLM, ESM2-T6 (8M parame- ters), ProtEGNN matches fine-tuned, sequence-only methods on mutational effect prediction, despite training orders of magnitude fewer parameters. Together, these results highlight geometric inductive bias as a powerful and scalable alternative to task-specific fine-tuning of large PLMs for protein modeling.

Presenter: ~Qurat_ul_ain1

Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.

Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.

Submission Number: 108

Loading