Keywords: Deep Research, Tool use, Multi-agent systems, Benchmark evaluation, E-commerce
TL;DR: DeepResearch Retail is a benchmark framework grounded in e-commerce data to evaluate deep research systems with web-search and internal APIs. Hybrid-ReAct is a multi-agent architecture allowing parallel reasoning and tool-use for report generation.
Abstract: Deep Research (DR) systems autonomously retrieve and synthesize information from web sources, however, industrial DR applications face a critical gap: effective integration of internal tools with web search. In this work, we introduce DeepResearch Retail, an evaluation framework grounded in real-world e-commerce data for assessing Deep Research with tools (DR+Tools) in realistic commercial settings. The framework evaluates both factual faithfulness and multidimensional response quality when reasoning over heterogeneous web and internal data sources.
We further present Hybrid-ReAct, a multi-agent architecture that demonstrates how collaborative reasoning and tool use can produce evidence-grounded answers. Experimental results validate our framework's utility, showing improvements in agent's performance when leveraging web-page information and multi-agent specialization.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 92
Loading