Track: Track 1: Original Research/Position/Education/Attention Track
TL;DR: Deep research system for literature review writing with all agents powered by diffusion language models.
Abstract: Deep Research systems built on autoregressive large language models inherit the limitations of left-to-right decoding, including outline drift, error propagation across sections, and the inability to revise early structural decisions once new evidence is gathered downstream. As reviews grow longer and rely on dozens of retrieved sources, these constraints translate into measurable losses in comprehensiveness and insight, since later findings cannot meaningfully reshape earlier commitments. We argue that a fundamentally different decoding paradigm is required to close this gap. We introduce DiffResearch, the first Deep Research framework to place a diffusion language model at the core of its writing stage, enabling parallel refinement of an entire review rather than section-by-section commitment. The system combines a lightweight multi-agent scaffold including intent classification, query reformulation, planning, retrieval, writing, and judging with two operating modes. These consist of a base mode for single-pass synthesis and an iterative subquery-decomposition mode where a judge agent identifies coverage gaps and triggers additional retrieval rounds until the evidence base is sufficient. On Deep Research Bench, DiffResearch achieves an overall score of 48.03, surpassing openai-deepresearch (46.45), Dr. Tulu (45.49), and claude-research (45.00), with consistent gains across comprehensiveness (46.95), insight (48.20), instruction following (49.41), and readability (47.40).
Keywords: Deep Research, AI4Science, LLM, diffusion, MDLM, diffusion LLM
Submission Number: 119
Loading