DeepResearchGuard: Deep Research with Open Domain Evaluation and Multi-Stage Guardrails for Safety

ICLR 2026 Conference Submission22406 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agent, Large Language Model, Guardrail, DeepResearch, Open-domainEvaluation
Abstract: Current deep research frameworks lack adequate evaluation procedures and stage-specific safeguards. Prior work primarily treats evaluation as question-answering accuracy. It overlooks report quality, especially credibility, coherence, breadth, depth, and safety, which allows hazardous or malicious sources to be integrated into the final report. To address these issues, we introduce DeepResearchGuard, a four-stage safeguard for input, plan, research, and output, integrated with open-domain evaluation of references and reports. We assess defense success rates, refusal rates, F1, FNR, FPR, and the five report dimensions across diverse LLMs, including gpt-4o, gemini-2.5-flash, DeepSeek-v3, and o4-mini. The average defense successful rate increased by 18.16\%, and the over refusal rate decreased by 6\%. The input guard contributes the most significant early increment by filtering out obvious risks, the plan and research guards improve citation discipline and source credibility, and the output guard strengthens structure, attribution, and risk disclosure. Upon examining the sensitivity of the guard model to performance, we identify a trade-off between safety and performance; specifically, gpt-5-mini, which offers enhanced security, yields a less in-depth report. In contrast, gpt-4o, with more fundamental settings, results in a higher risk but greater depth of the report. For queries that may present potential risks, advanced models could ignore more meaningful resources due to heightened security checks. Through the experiment, we concluded that DeepResearchGuard conduct open-domain evaluation and stage-aware defenses that block harmful propagation and systematically increase report quality without over-refusal. The code is available at \url{https://anonymous.4open.science/r/DeepResearchGuard-6A75/}.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22406
Loading