ProtoGNN: Prototype-Conditioned Graph Refinement for Meaningful RNA--Protein Interaction Representations
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Track: long paper (up to 10 pages)
Keywords: RNA–protein interaction, representation learning, pretrained biological language models, bipartite graph neural networks, prototype learning, retrieval and ranking, candidate prioritization
TL;DR: ProtoGNN refines fixed sequence embeddings via prototype-conditioned graph updates and logit-level contrastive regularization, enabling robust RNA-protein interaction modeling and RBP candidate prioritization.
Abstract: We study RNA--protein interaction (RPI) prediction in a setting where each RNA and protein is represented only by fixed pretrained sequence embeddings.
We propose ProtoGNN, a bipartite graph refinement model that augments standard edge scoring with prototype-conditioned, contrastive-inspired logit regularization.
ProtoGNN refines node states via streamlined bipartite propagation with adaptive raw$\rightarrow$graph fusion, constructs type-specific prototypes, and uses them to provide pair-specific global context during scoring.
We also introduce a strong embedding-only baseline, PairMLP, to quantify how much signal is present in the pretrained representations alone. Across NPInter2 and RPI7317 under 5-fold edge-level cross-validation, ProtoGNN consistently improves over PairMLP and matches or modestly improves upon previously reported baselines (including ZHMolGraph) under the same benchmark setting, achieving MCC $0.9191$ on NPInter2 and $0.8387$ on RPI7317.
To assess whether the learned representations are useful beyond binary decisions, we evaluate RNA-to-protein retrieval (RBP identification): given an RNA query, the model ranks all proteins by predicted interaction probability. On NPInter2, ProtoGNN achieves MRR $\approx 0.30$ with Recall@10 $\approx 0.26$, Recall@20 $\approx 0.51$, and Recall@50 $\approx 0.73$, indicating that refined fixed embeddings can support shortlist-style candidate prioritization.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Abdullah_Nayem_Wasi_Emran1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 38
Loading