Keywords: Retrieval-Augmented Generation, Large Language Model, Scientific Simulator, Scientific QA
Abstract: Large language models (LLMs) show promise in
solving scientific problems. They can help
generate long-form answers for scientific questions, which are crucial for
comprehensive understanding of complex phenomena that require detailed
explanations spanning multiple interconnected concepts and evidence.
However, LLMs often suffer from hallucination, especially in
the challenging task of long-form scientific question answering.
Retrieval-Augmented Generation (RAG) approaches can ground LLMs by
incorporating external knowledge sources to improve trustworthiness.
In this context, scientific simulators, which play a vital role in
validating hypotheses, offer a particularly promising retrieval source
to mitigate hallucination and enhance answer factuality.
However, existing RAG approaches cannot be directly applied for
scientific simulation-based retrieval due to two
fundamental challenges: how to retrieve from scientific
simulators, and how to efficiently verify and update long-form answers.
To overcome these challenges, we propose the
simulator-based RAG framework (SimulRAG)
and provide a long-form scientific QA benchmark covering climate science and
epidemiology with ground truth verified by both simulations and
human annotators. In this framework, we propose a generalized simulator retrieval interface
to transform between textual and numerical modalities. We further design
a claim-level generation method that utilizes uncertainty estimation scores
and simulator boundary assessment (UE+SBA) to efficiently verify and update claims.
Extensive experiments demonstrate SimulRAG outperforms traditional
RAG baselines by 30.4\% in informativeness and
16.3\% in factuality. UE+SBA further improves efficiency
and quality for claim-level generation.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16847
Loading