Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) enhances Large Language Models (LLMs) by integrating structured knowledge graphs, but it faces challenges in suboptimal path selection and redundant entity retrieval. To address these, we propose three key improvements: (1) $LLM-driven Structured Entity Extraction$, which enhances query understanding by extracting structured entities prior to retrieval; (2) $Beam Search-based Path Filtering$, which selects globally coherent reasoning paths over greedy nearest neighbor search; and (3) $Semantic Diversity Score (SDS)$, a novel metric that reduces redundancy by quantifying the diversity of retrieved entity clusters.
We evaluate our approach on multiple-choice QA datasets: MCTest, LexGLUE CaseHold, PubMedQA, and MedQA. Our method improves accuracy by $+1.16\%$, $+6.53\%$, $+4.9\%$, and $+0.31\%$ compared to the baseline LLaMA 3.1-8B, demonstrating enhanced retrieval informativeness and path coherence. Additionally, experiments on various LLMs, including Qwen2.5-7B, Gemma2-9B, and LLaMA 3.1-8B, show accuracy increases of $+12.34\%$, $+22.50\%$, and $+1.33\%$ on MCTest, respectively. While our method improves factual consistency and reasoning quality, further work is needed to adapt SDS to domain-specific tasks such as biomedical question answering.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Generation, Machine Learning for NLP, NLP Applications, Question Answering
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 3876
Loading