Relationship Extraction using Retrieval Augmented Generation for biomedical Dataset

Relationship Extraction using Retrieval Augmented Generation for biomedical Dataset

ACL ARR 2026 January Submission8150 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Relationship Extraction, RE, RAG, Retrieval Augmented Generation, NLP, Natural Language Processing, AI

Abstract: With the increasing number of structured and unstructured data, obtaining reliable information effectively has become crucial. In the biomedical domain, extracting information from the scientific papers is crucial in order to stay up-to-date with accurate information, given the increased pace by which new research studies are published. This work focuses on identifying relationships between entities that are extracted from the abstracts and titles of biomedical research papers. In this work, we developed a Retrieval Augmented Generation (RAG) based system to automatically identify relations between biomedical entities. We evaluate multiple open source Large Language Models (LLMs) and the number of examples (shots) required to improve the LLM's results. We evaluate our methods using precision, recall and F-1 scores and compare our approach to traditional deep learning methods using DeBERTa with a Convolutional Neural Network (CNN). Our results indicate that Qwen models using the RAG approach with 10-shot examples achieved the highest macro F1 score compared to the baseline and other LLMs under the same setting. At 35 shots, Qwen reasoning and Qwen non-reasoning model performed best, exhibiting the fewest hallucinated labels and maintaining high macro F1 scores.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: Relationship Extraction, RE, RAG, Retrieval Augmented Generation, NLP, Natural Language Processing

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 8150

Loading