Know Or Not: a library for systematically evaluating out-of-knowledge base robustness

Published: 06 Nov 2025, Last Modified: 06 Nov 2025AIR-FM PosterEveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Keywords: llm, robustness, abstention
Abstract: Large language models (LLMs) have achieved remarkable progress, yet their deployment in high-stakes domains remains limited by hallucination risks. Retrieval-augmented generation (RAG) mitigates these risks but cannot guarantee reliability when queries fall outside the knowledge base, where abstention is expected. We present a novel methodology for evaluating out-of-knowledge-base (OOKB) robustness - assessing whether LLMs know or not - in RAG settings without requiring manually annotated gold answers. Our approach is implemented in \texttt{knowornot}, an open-source library for constructing customizable OOKB robustness benchmarks. \texttt{knowornot} features (1) a unified, high-level API for streamlined evaluation, (2) a modular architecture supporting diverse LLM clients and retrieval configurations, (3) rigorous data modeling ensuring reproducibility and traceability, and (4) flexible tools for building tailored robustness pipelines. This work enables systematic, reproducible assessment of abstention behavior in LLM-based RAG systems, advancing their reliability for high-stakes applications.
Submission Track: Workshop Paper Track
Submission Number: 28
Loading