Plan/Search: Structured Planning as a Training Signal for Retrieval-Augmented Reasoning

ACL ARR 2026 January Submission2501 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: retrieval-augmented reasoning, multi-hop question answering, reinforcement learning, tool-augmented language models, structured planning, chain-of-thought
Abstract: Recent work has shown that reinforcement learning (RL) can train language models to effectively use retrieval tools for multi-hop question answering. However, existing approaches rely on implicit reasoning within free-form chain-of-thought, leaving the model to discover effective search strategies on its own. We propose Plan/Search, which introduces explicit planning scaffolds---structured templates that decompose reasoning into goal-setting, progress tracking, and action planning---as an on-demand action learned through RL. Our approach outperforms Search-R1 baselines across four multi-hop QA benchmarks and demonstrates strong zero-shot transfer to the GAIA benchmark. Analysis of training dynamics reveals that models learn qualitatively different strategies: Plan/Search develops short and frequent interactions that track explicit sub-goals and enable course correction through environmental feedback, while Search-R1 favors longer, monolithic reasoning chains that risk error compounding by filling knowledge gaps through internal inference.Ablations show that explicit progress tracking is the most critical component, and that on-demand invocation outperforms mandatory structure. Our findings suggest that structured scaffolds act as an inductive bias that shapes how models learn to coordinate reasoning with external tools.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Question Answering, Information Retrieval and Text Mining, Language Modeling, Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 2501
Loading