DeepResearch Retail: Benchmarking Tool-Augmented Deep Research in the E-Commerce Domain

Rafael Ferreira; Flavio Di Palo; Huilin Lu; Ayush Jain; Harsha Aduri

DeepResearch Retail: Benchmarking Tool-Augmented Deep Research in the E-Commerce Domain

Rafael Ferreira, Flavio Di Palo, Huilin Lu, Ayush Jain, Harsha Aduri

Published: 18 Apr 2026, Last Modified: 23 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Research, Tool use, Multi-agent systems, Benchmark evaluation, E-commerce

TL;DR: DeepResearch Retail is a benchmark framework grounded in e-commerce data to evaluate deep research systems with web-search and internal APIs. Hybrid-ReAct is a multi-agent architecture allowing parallel reasoning and tool-use for report generation.

Abstract: Deep Research (DR) systems autonomously retrieve and synthesize information from web sources, however, industrial DR applications face a critical gap: effective integration of internal tools with web search. In this work, we introduce DeepResearch Retail, an evaluation framework grounded in real-world e-commerce data for assessing Deep Research with tools (DR+Tools) in realistic commercial settings. The framework evaluates both factual faithfulness and multidimensional response quality when reasoning over heterogeneous web and internal data sources. We further present Hybrid-ReAct, a multi-agent architecture that demonstrates how collaborative reasoning and tool use can produce evidence-grounded answers. Experimental results validate our framework's utility, showing improvements in agent's performance when leveraging web-page information and multi-agent specialization.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 92

Loading