Predicting host-pathogen interactions using a proteome-scale language model

Cyril Malbranke; Cecilia Fruet; Anne-Florence Bitbol

Predicting host-pathogen interactions using a proteome-scale language model

Cyril Malbranke, Cecilia Fruet, Anne-Florence Bitbol

Published: 28 May 2026, Last Modified: 11 Jun 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: host-pathogen interactions, protein language models, fine-tuning, LoRA, attention coefficients

TL;DR: ProteomeLM encodes host-pathogen interaction signal without supervision (AUROC 0.786); LoRA fine-tuning with asymmetric masking and blocked pathogen self-attention boosts this to 0.808, generalizing across ten viral and bacterial pathogens.

Abstract: ProteomeLM is a proteome-scale language model trained on proteomes spanning the tree of life to reconstruct masked protein embeddings from proteome context within each species. Its attention coefficients capture protein-protein interactions without supervision. Here, we show that this capability extends to cross-species host-pathogen interactions~(HPI) across ten human pathogen taxa spanning viruses and bacteria, and can be further improved with lightweight fine-tuning. We introduce \textbf{ProteomeLM-HPI}, a parameter-efficient adaptation via LoRA, trained on concatenated host-pathogen proteomes to reconstruct masked pathogen embeddings from host context. ProteomeLM-HPI involves two key design choices: \emph{asymmetric masking} (pathogen-heavy masking) and \emph{blocked self-attention}. Systematic ablations show that both choices contribute. To assess generalization, we introduce a strict cross-species benchmark enforcing pathogen-level holdout and 40\% sequence-identity filtering. On this benchmark, Proteome-HPI improves AUC on 8 out of 10 unseen pathogens. This is a work-in-progress report; code, data and models will be made publicly available.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 51

Loading