Decomposing Scientific Paper Queries with Draft-and-Follow Policy Optimization to Narrow Knowing-Doing Gap
Keywords: Agentic Reinforcement Learning, LLM, Scientific Paper QA
TL;DR: We significantly enhance small LLM's interaction efficiency without compromising performance by introducing a special hierarchical reinforcement learning architecture.
Abstract: The rapid growth in the volume of scientific papers presents a significant challenge for researchers to keep up with the latest advances in their field by relying solely on manual reading. Given recent advances in Large Language Models (LLMs), there is a growing trend of employing autonomous agents to extract key information from scientific papers. Although promising, existing approaches generally rely on either meticulously engineered prompts or a standard SFT-RL pipeline, methodologies that are often prone to inducing excessive and ineffective exploration. Inspired by cognitive science, we introduce \textbf{PaperCompass}, a novel framework designed to address these limitations. Specifically, PaperCompass first generates a draft outlining the sequence of planned execution steps and subsequently engages in fine-grained reasoning to determine parameters for the corresponding function calls. Furthermore, to support this process, we develop a bespoke RL method named \textbf{D}raft-\textbf{F}ollow \textbf{P}olicy \textbf{O}ptimization, which concurrently optimizes both the draft plan and the final solution. \textbf{DFPO} can be viewed as a streamlined implementation of Hierarchical RL, designed to bridge the `knowing-doing' gap observed in LLMs. We provide a theoretical analysis of DFPO, demonstrating its desirable properties and thereby ensuring a reliable optimization process. Experiments on paper-based question-answering (Paper-QA) benchmarks demonstrate that PaperCompass's superior efficiency over existing baselines without compromising performance, achieving results comparable to those of much larger models.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 3227
Loading