Keywords: Agent, LLM, E-commerce
Abstract: E-commerce agents contribute greatly to helping users complete their e-commerce needs. To promote further research and application of e-commerce agents, benchmarking frameworks are introduced for evaluating LLM agents in the e-commerce domain.
Despite the progress, current benchmarks lack evaluating agents' capability to handle mixed-type e-commerce dialogue and complex domain rules. To address the issue, this work first introduces a novel corpus, termed Mix-ECom,
which is constructed based on real-world customer-service dialogues with post-processing to remove user privacy and add CoT process.
Specifically, Mix-ECom contains 4,799 samples with multiply dialogue types in each e-commerce dialogue, covering four dialogue types (QA, recommendation, task-oriented dialogue, and chit-chat),
three e-commerce task types (pre-sales, logistics, after-sales), and 82 e-commerce rules.
Furthermore, this work build baselines on Mix-Ecom and propose a dynamic framework to further improve the performance.
Results show that current e-commerce agents lack sufficient capabilities to handle e-commerce dialogues, due to the hallucination cased by complex domain rules. The dataset will be publicly available.
Primary Area: datasets and benchmarks
Submission Number: 17668
Loading