Abstract: Retrieval-Augmented Generation (RAG) mitigates hallucinations of Large Language Models (LLMs) and improves factual accuracy by incorporating external knowledge during inference. However, existing RAG approaches face critical limitations in resource-constrained environments: (1) dense retrieval often introduces irrelevant or redundant content due to semantic overlap in vector space, and (2) graph-based methods still depend on sophisticated reasoning capabilities that exceed the capacity of Small Language Models (SLMs). To address these challenges, we propose Adaptive Graph-Chunk RAG (AdaGCRAG), a lightweight, local, and query-adaptive RAG framework that integrates chunk-level retrieval with graph-structured reasoning. AdaGCRAG categorizes each input into Simple Specific Question (Simple SQ), Complex Specific Question (Complex SQ), or Abstract Question (Abstract Q) and adaptively controls the retrieval granularity and reasoning depth. It leverages Qwen3’s dual-mode reasoning to minimize computation for simple tasks while enabling graph-based multi-hop inference for complex ones. A triple-level reranking module filters noisy subgraphs before fusion with dense retrieval results. AdaGCRAG runs efficiently on a single RTX 3070 GPU, making it suitable for personal agents and offline assistants. Experiments across five Question Answering (QA) benchmarks show consistent improvements over state-of-the-art lightweight RAG systems, achieving improvements of 9.5%, 6.9%, and 6.8% on HotpotQA, MultihopRAG, and MuSiQue, respectively, demonstrating its effectiveness under limited-resource settings. Our implementation is available at https://anonymous.4open.science/r/AdaGCRAG.
External IDs:dblp:conf/semweb/ZhangXHGZZ25
Loading