RAVine: Reality-Aligned Evaluation for Agentic Search

RAVine: Reality-Aligned Evaluation for Agentic Search

ACL ARR 2026 January Submission8465 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic Search, Evaluation, Large Language Models

Abstract: Agentic search, as a more autonomous and adaptive paradigm of retrieval augmentation, is driving the evolution of intelligent search systems. However, existing evaluation frameworks fail to align well with the real goals of agentic search. First, existing evaluation queries with complex queries and short-form answers often deviate from realistic user search scenarios. Second, most evaluations focus solely on the end-to-end performance, neglecting assessment of iterative process inherent to agentic search. To address these limitations, we propose RAVine---a Reality-Aligned eValuation framework for agentic LLMs with search. RAVine targets real user queries which need multi-faceted search and long-form answers. And we introduce an attributable nuggets construction strategy to enhance long-form evaluation precision and consistency. Moreover, RAVine examines models with process-oriented metrics, including search tool performance and efficiency. We benchmark a series of models using RAVine and derive several insights, which we hope will contribute to advancing the development of agentic search systems.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, evaluation methodologies, evaluation, metrics

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 8465

Loading