OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

ACL ARR 2026 March Submission603 Authors

15 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Synthesis, LLM Agents, Information Seeking, Deep Research

Abstract: Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to reproduce. We present \model, a reproducible pipeline that decouples one-time corpus bootstrapping from multi-turn trajectory synthesis and executes the search-and-browse loop entirely offline using three explicit browser primitives: \texttt{search}, \texttt{open}, and \texttt{find}, over a 15M-document corpus. Using GPT-OSS-120B as the teacher model, we synthesize over 97K trajectories, including a substantial long-horizon tail with 100+ tool calls. Supervised fine-tuning a 30B-A3B backbone on these trajectories achieves 54.8\% accuracy on BrowseComp-Plus, a +34.0 point improvement over the base model, while remaining competitive on BrowseComp, GAIA, and xbench-DeepSearch. Because the environment is offline and fully instrumented, it also enables controlled analysis, where our study reveals practical insights into deep research pipeline design, including data filtering strategies, agent configuration choices, and how retrieval success relates to final answer accuracy. We release the pipeline, synthesized trajectories, model checkpoints, and the offline search environment at https://anonymous.4open.science/r/OpenResearcher-5BF7.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: LLM Agents, NLP Applications, Information Retrieval and Text Mining, Question Answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 603

Loading