JUST ADD STRUCTURE: PROTEIN LANGUAGE MODELS COMBINED WITH STRUCTURAL EQUIVARIANCE EXCEL AT PROTEIN TASKS

Published: 02 Mar 2026, Last Modified: 05 Mar 2026GEM 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: sequence, structure, protein, egnn, equivariance, language models
TL;DR: We contend that prioritizing higher- fidelity representations and biology-aware architectures will yield significantly greater dividends in protein modeling than indiscriminate parameter scaling or elaborate adaptation of PLMs.
Abstract: Accurate in silico prediction of protein properties, functional fitness, and muta- tional effects remains a central challenge in protein engineering and therapeutic design. While Protein Language Models (PLMs) successfully capture rich evo- lutionary and functional constraints from sequence data, they only indirectly en- code the spatial and geometric information that fundamentally governs protein function. Consequently, state-of-the-art approaches typically rely on extensive fine-tuning, ensembling, or the incorporation of handcrafted structural features to achieve competitive accuracy, making them computationally expensive and dif- ficult to scale. In this work, we demonstrate that explicit geometric modeling can substitute for, and in most cases outperform, large-scale PLM fine-tuning, with much higher parameter efficiency. Our approach, ProtEGNN, pairs PLM residue representations with a lightweight E(3)-Equivariant Graph Neural Net- work, competing with or achieving state-of-the-art performance across seven dif- ferent benchmarks in protein property, mutational effect and function prediction, while needing 100–1000× fewer parameters than competing approaches. Notably, even when paired with the smallest readily available PLM, ESM2-T6 (8M parame- ters), ProtEGNN matches fine-tuned, sequence-only methods on mutational effect prediction, despite training orders of magnitude fewer parameters. Together, these results highlight geometric inductive bias as a powerful and scalable alternative to task-specific fine-tuning of large PLMs for protein modeling.
Presenter: ~Qurat_ul_ain1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does not fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 108
Loading