Keywords: bargaining, large language models, seller agents, intent recognition, dialogue systems, e-commerce, negotiation, benchmark, multi-turn evaluation
TL;DR: We introduce an LLM-based seller agents benchmark for multi-turn e-commerce bargaining, evaluating how well models track and interpret buyer intents. The framework can be applied to other negotiation or dialogue settings.
Abstract: In online second-hand marketplaces, multi-turn bargaining is a crucial part of seller-buyer interactions. Large Language Models (LLMs) can act as seller agents, negotiating with buyers on behalf of sellers under given business constraints. A critical ability for such agents is to track and accurately interpret cumulative buyer intents across long negotiations, which directly impacts bargaining effectiveness. We introduce a multi-turn evaluation framework for measuring the bargaining ability of seller agents in e-commerce dialogues. The framework tests whether an agent can extract and track buyer intents. Our contributions are: (1) a large-scale e-commerce bargaining benchmark spanning 622 categories, 9,892 products, and 3,014 tasks; (2) a turn-level evaluation framework grounded in Theory of Mind (ToM), enabling detailed assessment of model performance beyond outcome-only metrics; and (3) an automated pipeline that constructs intent annotations and evaluation data from large-scale dialogues, transferable across datasets and negotiation domains.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 8637
Loading