Boosting-Inspired Validation of Retrieval-Augmented Generation in Structured Scientific Knowledge Bases
Keywords: proof-of-concept, human–AI collaboration, Large Language Models
TL;DR: A proof-of-concept study in which a human researcher supervised ChatGPT (GPT-4/5) in creating a scientific paper — resulting in the paper itself.
Abstract: Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) achieve remarkable results, yet they often hallucinate or provide incomplete answers. This poses critical challenges in scientific knowledge domains where factuality and precision are essential. In this paper, we propose a boosting-inspired evaluation framework for RAG that combines iterative error reduction with forward-looking retrieval mechanisms from FLARE. Unlike existing work that primarily optimizes retrieval or ranking, our focus is on the validation loop itself. We validate the framework in a controlled scenario using Citavi, a structured literature management system, serving as a reproducible environment for testing. Results indicate that strict substring matching underestimates semantic correctness, while boosting-inspired metrics highlight when expansion is necessary. This proof-of-concept demonstrates technical feasibility and motivates iterative, semantic validation for future scientific assistants.
Supplementary Material: pdf
Submission Number: 170
Loading