Mind the Gap: Measuring Knowledge Gaps in RAG Pipelines

Mind the Gap: Measuring Knowledge Gaps in RAG Pipelines

ACL ARR 2025 July Submission1014 Authors

29 Jul 2025 (modified: 27 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Retrieval-augmented generation (RAG) systems aim to improve the reliability of answers by incorporating information from external sources. The value of RAG depends on how well the knowledge base meets users' information needs. However, most existing evaluation methods for RAG pipelines focus on the quality of the generated answers or the precision of the retriever, without assessing whether the knowledge base itself contains the needed information. RAG benchmarks are typically created by generating questions directly from the documents in the knowledge base, which may not reflect the diversity of real user questions. We introduce GapView, a framework for evaluating whether the knowledge base in a RAG pipeline provides sufficient coverage to support expected user questions. GapView uses cosine similarity between embeddings and 2D Multi-Dimensional Scaling (MDS) projections to check whether a question is semantically aligned with any document in the corpus. We evaluated it on six synthetic datasets from clinical and programming domains. Results show that GapView achieves high F1 scores ($\geq 0.93$) in predicting coverage and reveals domain-specific performance differences. Unlike traditional RAG metrics, GapView identifies knowledge gaps and provides clear visualizations that reveal where information is missing. Our findings highlight the importance of validating knowledge base coverage in RAG pipelines and offer a scalable method for flagging unsupported questions before they go through the RAG pipeline.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: retrieval-augmented generation, evaluation methodologies, automatic evaluation of datasets, benchmarking, question answering, interpretability

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: N/A

B1 Elaboration: We created all artifacts used in the paper; see Section 3 for details. Artifacts will be released on acceptance of manuscript.

B2 Discuss The License For Artifacts: N/A

B3 Artifact Use Consistent With Intended Use: N/A

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: N/A

C Computational Experiments: Yes

C1 Model Size And Budget: N/A

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: See Section 4 for details on the experimental setup

C3 Descriptive Statistics: Yes

C3 Elaboration: See Section 5 for details on the experimental setup

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: Yes

D1 Instructions Given To Participants: Yes

D1 Elaboration: The instructions given to annotators are described in Section 4.2

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Generative AI was used for grammar assistance; the authors take full responsibility for the content.

Author Submission Checklist: yes

Submission Number: 1014

Loading