Keywords: LLM, Agent, Literature Search, Deep Research
TL;DR: An agentic literature-review framework whose reflective expansion and critic-guided refinement loops achieve a 53.46 RACE score on DeepResearch-Bench, beating every reported commercial deep research agent.
Abstract: We present IRIS (Iterative Refinement for Information Synthesis), an agentic framework for automated scientific literature review. IRIS breaks the research process into three phases (iterative planning, parallel per-section research and writing, and final report compilation), orchestrated as a directed state graph. Two mechanisms are central to the design. First, reflective research expansion: an LLM-based critic repeatedly evaluates how well the accumulated evidence covers the target topic and generates new queries until a sufficiency threshold is met. Second, critic-guided refinement: when a section fails quality review, an LLM critic pinpoints what is missing (mechanistic detail, quantitative data, recent findings) and issues targeted follow-up searches rather than simply requesting a rewrite. Citations are tracked at the claim level, mapping every factual assertion back to specific source passages. On DeepResearch-Bench (50 PhD-level English research tasks across 22 domains), IRIS achieves a RACE overall score of 53.46, outperforming every commercial system reported in the benchmark, including Gemini 2.5 Pro Deep Research (48.88) and OpenAI Deep Research (46.98). IRIS wins all four RACE dimensions despite being restricted to PubMed and ArXiv, with no open-web retrieval.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 38
Loading