HyperRAG: Combining Retrieval-Augmented Generation and Hyperbolic Embeddings for Ontology-Based Phenotype Linking from Text
Keywords: Information retrieval, Ontology, Biomedical NLP, Hyperbolic embeddings, Evaluation framework
Abstract: Extracting structured knowledge from unstructured text is a fundamental challenge in machine learning, particularly for concepts organized within complex hierarchical ontologies. In genomics, identifying phenotypes from clinical narratives is crucial for diagnostic precision, yet current methods struggle with contextual interpretation and subtle clinical descriptions.
We present HyperRAG, a workflow that combines semantic and hierarchical signals for ontology-based entity linking.
By integrating Large Language Models with Retrieval-Augmented Generation and a hybrid reranking strategy using both Euclidean (semantic) and hyperbolic (hierarchical) embeddings trained on Human Phenotype Ontology, our approach improves entity linking while ensuring ontological consistency.
We show that while hyperbolic embeddings alone improve hierarchical consistency, their main benefit emerges when used as a hierarchy-aware prior in a hybrid reranking scheme. Experiments on benchmark and real-world clinical datasets demonstrate improved recall, ranking quality, and ontological coherence compared to prior systems, particularly for implicit phenotype mentions. We further introduce a hierarchy-aware evaluation framework that reflects clinical relevance beyond exact-match metrics. All code, models, and datasets will be released upon publication.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Information Extraction, Information Retrieval and Text Mining, NLP Applications, Machine Learning for NLP, Resources and Evaluation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 1435
Loading