Boosting-Inspired Validation of Retrieval-Augmented Generation in Structured Scientific Knowledge Bases

Boosting-Inspired Validation of Retrieval-Augmented Generation in Structured Scientific Knowledge Bases

14 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: proof-of-concept, human–AI collaboration, Large Language Models

TL;DR: A proof-of-concept study in which a human researcher supervised ChatGPT (GPT-4/5) in creating a scientific paper — resulting in the paper itself.

Abstract: Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) achieve remarkable results, yet they often hallucinate or provide incomplete answers. This poses critical challenges in scientific knowledge domains where factuality and precision are essential. In this paper, we propose a boosting-inspired evaluation framework for RAG that combines iterative error reduction with forward-looking retrieval mechanisms from FLARE. Unlike existing work that primarily optimizes retrieval or ranking, our focus is on the validation loop itself. We validate the framework in a controlled scenario using Citavi, a structured literature management system, serving as a reproducible environment for testing. Results indicate that strict substring matching underestimates semantic correctness, while boosting-inspired metrics highlight when expansion is necessary. This proof-of-concept demonstrates technical feasibility and motivates iterative, semantic validation for future scientific assistants.

Supplementary Material: pdf

Submission Number: 170

Loading