PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization
Keywords: Reinforcement Learning, Large Language Model, Paper Search
Abstract: Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on predefined workflows, which limits their flexibility when handling complex queries. We propose PaperScout, an autonomous agent that formulates paper search as a sequential decision-making process, enabling the agent to dynamically decide when and how to invoke search and reference expansion actions over multiple turns. To train such an agent stably, we introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent--environment interaction. Comprehensive experiments show that PaperScout trained with PSPO achieves superior retrieval performance compared to existing methods.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents; tool use
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 6380
Loading