Automatic Scientific Claims Verification with Pruned Evidence Graph

Liri Fang; Dongqi Fu; Vetle I Torvik

Automatic Scientific Claims Verification with Pruned Evidence Graph

Liri Fang, Dongqi Fu, Vetle I Torvik

Published: 26 Apr 2025, Last Modified: 26 Apr 2025ICLR 2025 Workshop AgenticAI PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scientific Claim Verification, Evidence Graph Pruning, Graph Neural Networks

TL;DR: This paper presents PrunE, a lightweight graph-based framework that enhances scientific claim verification by pruning evidence graphs to improve rationale selection and stance prediction using pretrained language models.

Abstract: In general, automatic scientific claim verification methods retrieve evidence paragraphs, select rationale sentences, and predict the sentence stances for a given claim. Large language models (LLMs) are expected to be the next-generation tool to solve this task. However, due to the domain-specific claims, LLMs trained on the large-scale general corpus at least need external knowledge to warm up. Therefore, how to extract qualified and reasonable sentences with their stances toward a given claim is indispensable. GraphRAG is designed to learn the hierarchical relationships of context and selectively retrieve related information, improving LLMs’ reasoning in ad-hoc and domain-specific claim verification scenarios. Nevertheless, current GraphRAG methods typically require a pre-existing domain-specific knowledge base. Hence, a natural question can be asked: How far are we from automatically building a semantic graph and selecting rationale sentences for a pre-trained LLM, and which process is better to be independent of the pre-trained LLM? In this paper, we propose our ongoing research on distilling information across sentences by constructing a complete evidence graph and pruning it to capture the relevant connections between claim and paragraph sentences. This enables updating the sentence embeddings, and consequently enhances multiple-rationale sentence identification and stance prediction. The effectiveness of this proposed framework is empirically tested on SciFact, i.e., an open-access dataset in the biomedical domain. From the current stage, we discern that selected baselines, including our method, can hardly outperform across all experimental settings, which indicates many future research directions for researchers and practitioners.

Submission Number: 38

Loading