GapView: Measuring Knowledge Base Fitness in RAG Systems

ACL ARR 2026 January Submission2185 Authors

02 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: retrieval-augmented generation, knowledge base sufficiency, pre-retrieval evaluation, cosine similarity, text embeddings, embedding dimensionality, Matryoshka embeddings, semantic similarity, interpretability, visualization
Abstract: Retrieval Augmented Generation (RAG) systems extend large language models by grounding them in external documents. However, most evaluations measure retrieval or generation quality rather than asking a more basic question of if the knowledge base itself contain the information needed to answer user questions? This work introduces GapView, a diagnostic framework that measures knowledge-base sufficiency before retrieval occurs. GapView computes cosine similarity between question and document embeddings, analyzes stability across embedding dimensions, and visualizes the resulting relationships using multidimensional scaling(MDS), polar, and one-dimensional ranked plots. Using six small synthetic datasets from programming and medical text, the results show that cosine similarity correlates with human judgments at moderate to strong levels, with correlation values ranging from 0.32 to 0.83. Dimensionality analysis reveals that below 100 dimensions, there is a loss of semantic clarity. Visualization analysis shows that MDS contributes little diagnostic value, as it fails to distinguish which questions relate to which documents. While, simpler polar and one dimensional ranked plots make numerical patterns intuitive and suggest where the knowledge base lacks sufficient information. Together GapView provides an interpretable and pre-retrieval method for detecting missing knowledge and assessing completeness of the knowledge base for RAG systems.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: retrieval-augmented generation, RAG evaluation, knowledge base sufficiency, pre-retrieval diagnostics, embedding similarity, question answering
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 2185
Loading