Writing As Reasoning: Interleaving Drafting and Deepening for Open-Ended Deep Research

ACL ARR 2026 January Submission8151 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM-based agent, open-ended deep research, deep information seeking, knowledge-intensive long writing
Abstract: Generating deep research reports requires large-scale information acquisition and the synthesis of insight-driven analyses, posing a major challenge for current language models. Existing methods largely follow a *plan-then-write* paradigm, whose performance strongly depends on the quality of the initial outline. We propose a **Writing As Reasoning Policy (WARP)** framework, which enables models to dynamically revise outlines during report writing. In this policy, the agent alternates between ***Evidence-Based Drafting*** and ***Reasoning-Driven Deepening*** phases, jointly supporting information acquisition, knowledge refinement, and outline updating. We further introduce a **Multi-Stage Agentic Training**—including cold start, atomic skill RL, and holistic pipeline RL—that enables small models to operate the WARP effectively. Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym show that our approach allows small models to surpass leading closed-source systems, particularly with substantial improvements in *Insight* metric.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents, reinforcement learning in agents, planning in agents, environment interaction
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, Chinese
Submission Number: 8151
Loading