Improving and Evaluating Open Deep Research Agents

Improving and Evaluating Open Deep Research Agents

ICLR 2026 Conference Submission21613 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Research Agents, Autonomous Web Search, Multi-hop Question Answering, Large Language Models, Open-source Benchmarks, BrowseComp, ODR+, Information Retrieval and Synthesis

TL;DR: We present ODR+, an open‑source deep research agent that autonomously answers complex web questions using LLMs, subquestion decomposition, iterative search, and structured answer synthesis.

Abstract: Deep Research Agents (DRAs) are systems that can take a natural language prompt from a user and autonomously search for and utilize internet-based content to address it. Recent DRAs have demonstrated impressive capabilities on public benchmarks, but most research has focused on proprietary, closed-source systems. At the time of this work, we identified only one open-source DRA, Open Deep Research (ODR). To enable systematic comparison, we adapt the challenging BrowseComp benchmark and introduce BrowseComp-Small (BC-Small), a computationally tractable subset designed for academic labs. We benchmark ODR and two proprietary systems from Anthropic and Google on BC-Small, finding that all three achieve 0% accuracy on the 60-question test set. We then propose ODR+, an enhanced version of ODR with sub-question decomposition, iterative planning, and structured synthesis. ODR+ achieves 10% accuracy on BC-Small—state-of-the-art among both open-source and closed-source systems under evaluation. Ablation studies confirm that all three improvements contributed to ODR+’s performance.

Supplementary Material: pdf

Primary Area: foundation or frontier models, including LLMs

Submission Number: 21613

Loading