SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

Chenyang Shao; Fengli Xu; Yong Li

SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

Chenyang Shao, Fengli Xu, Yong Li

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Research, LLM Agent, Literature Retrieval, Scientometrics

Abstract: The rapid development of AI agents has spurred the development of advanced research tools, such as *Deep Research*. Achieving this requires a nuanced understanding of the relations within scientific literature, surpassing the scope of keyword-based or embedding-based retrieval. Existing retrieval agents mainly focus on content-level similarities and are unable to decode critical relational dynamics, such as identifying corroborating or conflicting studies or tracing technological lineages, all of which are essential for a comprehensive literature review. Consequently, this fundamental limitation often results in a fragmented knowledge structure, misleading sentiment interpretation, and inadequate modeling of collective scientific progress. To investigate relation-aware retrieval more deeply, we propose **SciNetBench**, the first **Sci**entific **Net**work Relation-aware **Bench**mark for literature retrieval agents. Constructed from a corpus of over 18 million AI papers, our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories. Through extensive evaluation of three categories of retrieval agents, we find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms. Notably, further experiments on literature review tasks demonstrate that providing agents with relational ground truth leads to a substantial 23.4% performance improvement in review quality, validating the critical importance of relation-aware retrieval. We publicly release our benchmark at [https://anonymous.4open.science/r/SciNetBench/](https://anonymous.4open.science/r/SciNetBench/) to support future research on advanced retrieval systems.

Primary Area: datasets and benchmarks

Submission Number: 10065

Loading