Keywords: rare diseases; knowledge graphs
Abstract: Rare genetic disorders affect over 300 million people worldwide and remain difficult to diagnose, with current genomic approaches yielding definitive answers in only 30-50% of cases. Existing phenotype-driven methods often rely on expert-curated candidate gene lists and are sensitive to incomplete clinical data, limiting their real-world utility. We present PhenoKG, a knowledge-graph framework that enriches patient phenotypes with biomedical knowledge and can rank flexible amount of genes by their likelihood of being causative ($\sim$4,000). PhenoKG integrates graph neural networks with transformer-based encoders to capture patient-specific phenotype-gene relationships, and incorporates an optional reranking procedure that leverages recently validated clinical associations to extend the knowledge graph while maintaining robustness to noisy or incomplete input. Designed to operate with or without candidate lists, PhenoKG achieves strong performance across diverse diagnostic settings and consistently outperforms state-of-the-art methods on rare disease benchmarks.
Together, these results position PhenoKG as a step toward scalable, phenotype-first models for rare disease diagnosis, and open the path to integrating heterogeneous biomedical data for faster, more equitable genetic discovery.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 3065
Loading