ProtoGNN: Prototype-Conditioned Graph Refinement for Meaningful RNA--Protein Interaction Representations

ProtoGNN: Prototype-Conditioned Graph Refinement for Meaningful RNA--Protein Interaction Representations

04 Feb 2026 (modified: 04 Mar 2026)Submitted to ICLR 2026 Workshop LMRLEveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Track: long paper (up to 10 pages)

Keywords: RNA–protein interaction, representation learning, pretrained biological language models, bipartite graph neural networks, prototype learning, retrieval and ranking, candidate prioritization

TL;DR: ProtoGNN refines fixed sequence embeddings via prototype-conditioned graph updates and logit-level contrastive regularization, enabling robust RNA-protein interaction modeling and RBP candidate prioritization.

Abstract: We study RNA--protein interaction (RPI) prediction in a setting where each RNA and protein is represented only by fixed pretrained sequence embeddings. We propose ProtoGNN, a bipartite graph refinement model that augments standard edge scoring with prototype-conditioned, contrastive-inspired logit regularization. ProtoGNN refines node states via streamlined bipartite propagation with adaptive raw$\rightarrow$graph fusion, constructs type-specific prototypes, and uses them to provide pair-specific global context during scoring. We also introduce a strong embedding-only baseline, PairMLP, to quantify how much signal is present in the pretrained representations alone. Across NPInter2 and RPI7317 under 5-fold edge-level cross-validation, ProtoGNN consistently improves over PairMLP and matches or modestly improves upon previously reported baselines (including ZHMolGraph) under the same benchmark setting, achieving MCC $0.9191$ on NPInter2 and $0.8387$ on RPI7317. To assess whether the learned representations are useful beyond binary decisions, we evaluate RNA-to-protein retrieval (RBP identification): given an RNA query, the model ranks all proteins by predicted interaction probability. On NPInter2, ProtoGNN achieves MRR $\approx 0.30$ with Recall@10 $\approx 0.26$, Recall@20 $\approx 0.51$, and Recall@50 $\approx 0.73$, indicating that refined fixed embeddings can support shortlist-style candidate prioritization.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Abdullah_Nayem_Wasi_Emran1

Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.

Submission Number: 38

Loading