PRIMA: a bidirectional state-space architecture and training approach for sequence modelling of protein-protein interactions

Published: 28 May 2026, Last Modified: 03 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: State-space Models, Protein-Protein Interactions, LLM, Binding Affinity Prediction
TL;DR: Insights into a sequence-based state space model for protein-protein interactions
Abstract: Protein-protein interactions (PPIs) govern essential biological processes, yet building protein language models (PLMs) that directly capture the complexity of interacting multi-chain sequences remains an open challenge. Transformer-based PLMs rely on self-attention whose quadratic complexity makes processing long interacting complexes resource-intensive, leaving an emerging gap between a rich diversity of current single-chain PLMs and scalable PPI PLMs. We present PRIMA, a proof-of-concept PLM for PPIs built on bidirectional Mamba (BiMamba), a selective state-space model that scales linearly with sequence length and processes arbitrarily long inputs without positional encodings. PRIMA is trained in two phases: first on diverse single protein sequences to learn general representations, then on concatenated PPI pairs to capture interaction-specific context. At an 8 million parameter scale and equal count to a transformer baseline, PRIMA (i) outperforms transformer-based PLMs on standard-length inputs, (ii) outperforms them on longer sequences and length extrapolation tasks, and (iii) does so at significantly lower computational cost, establishing BiMamba as a strong backbone for scalable PPI PLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 4
Loading