PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization

ACL ARR 2026 January Submission6380 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Large Language Model, Paper Search
Abstract: Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on predefined workflows, which limits their flexibility when handling complex queries. We propose PaperScout, an autonomous agent that formulates paper search as a sequential decision-making process, enabling the agent to dynamically decide when and how to invoke search and reference expansion actions over multiple turns. To train such an agent stably, we introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent--environment interaction. Comprehensive experiments show that PaperScout trained with PSPO achieves superior retrieval performance compared to existing methods.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents; tool use
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 6380
Loading