HyperRAG: Combining RAG and Hyperbolic Embeddings for Phenotypes Linking from Text

ACL ARR 2025 May Submission2207 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Extracting knowledge from unstructured data is a critical task for advancing human understanding and supporting decision-making across various domains. This is especially pertinent in genomics, where identifying phenotypes from clinical narratives is essential for enhancing diagnostic precision and enabling personalized medicine. While current methods perform well in recognizing explicitly stated phenotypes, they often struggle to capture implicit or nuanced representations. In this paper, we introduce a novel workflow that integrates Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) and hierarchical reranking, utilizing hyperbolic embeddings trained on the Human Phenotype Ontology (HPO). Furthermore, we contend that conventional evaluation frameworks relying on exact string matching are insufficient for comprehensive performance assessment, as they fail to account for the hierarchical structure inherent to the target ontology. To address this, we propose new evaluation metrics that leverage the hierarchical relationships within HPO. Our experiments on benchmark datasets, including a newly curated, challenging dataset (CHU-50), demonstrate the effectiveness of our approach, yielding substantial improvements in ranking accuracy and overall performance.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Information Extraction; Information Retrieval and Text Mining; NLP Applications; Resources and Evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 2207
Loading