Phenotype-Conditioned Drug Repurposing for Undiagnosed Rare Disease Patients via Graph Neural Networks and LLM Hybridization

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: drug repurposing, rare diseases, knowledge graphs, graph neural networks, LLM-GNN integration, phenotype-driven prediction, PrimeKG, R-GCN, large language models
TL;DR: We bypass the rare disease diagnostic bottleneck by ranking drugs directly from clinical phenotypes via an R-GCN+LLM hybrid architecture, with score-level fusion recovering 76% of the oracle-diagnosis TxGNN ceiling without ever using a disease label.
Abstract: Rare disease patients often face years of diagnostic delay, and many never receive a confirmed molecular diagnosis. Existing drug repurposing models usually require a disease label, while phenotype-based diagnostic tools stop at diagnosis without recommending treatment. We formulate diagnosis-free drug repurposing as a phenotype-set $\rightarrow$ drug ranking task on PrimeKG, evaluated on 108 held-out diseases with 914 disease--drug pairs. We propose a graph and LLM hybrid that combines an R-GCN encoder, drug-conditioned cross-attention, and biomedical text embeddings. Score-level fusion achieves MRR 0.325 on all test diseases and 0.311 on the 78 diseases where PubCaseFinder fails, tripling the cascade MRR of 0.103 and recovering 76\% of the oracle TxGNN ceiling without using a diagnosis. Our method is the strongest clean model and is more robust to missing disease labels, dropping only 3 - 4\% from the full set to the undiagnosable subset, compared with 67\% for the cascade.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 173
Loading