PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization

PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization

ACL ARR 2026 January Submission6380 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Large Language Model, Paper Search

Abstract: Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on predefined workflows, which limits their flexibility when handling complex queries. We propose PaperScout, an autonomous agent that formulates paper search as a sequential decision-making process, enabling the agent to dynamically decide when and how to invoke search and reference expansion actions over multiple turns. To train such an agent stably, we introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent--environment interaction. Comprehensive experiments show that PaperScout trained with PSPO achieves superior retrieval performance compared to existing methods.

Paper Type: Long

Research Area: AI/LLM Agents

Research Area Keywords: LLM agents; tool use

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 6380

Loading