GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing

Silan Hu; Shiqi Zhang; Yimin Shi; Xiaokui Xiao

GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing

Silan Hu, Shiqi Zhang, Yimin Shi, Xiaokui Xiao

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Datasets and Benchmarks, Generative AI Monetization, Ad-injected Response Generation

TL;DR: This work presents GEM-Bench, the first benchmark for evaluating ad-injected response generation, providing datasets, metrics, and baselines to advance research on balancing user satisfaction and engagement.

Abstract: Generative Engine Marketing (GEM) is an emerging ecosystem for monetizing generative engines, such as LLM-based chatbots, by seamlessly integrating relevant advertisements into their responses. At the core of GEM lies the generation and evaluation of ad-injected responses. However, existing benchmarks are not specifically designed for this purpose, which limits future research. To address this gap, we propose GEM-Bench, the first comprehensive benchmark for ad-injected response generation in GEM. GEM-Bench includes three curated datasets covering both chatbot and search scenarios, a metric ontology that captures multiple dimensions of user satisfaction and engagement, and several baseline solutions implemented within an extensible multi-agent framework. Our preliminary results indicate that, while simple prompt-based methods achieve reasonable engagement such as click-through rate, they often reduce user satisfaction. In contrast, approaches that insert ads based on pre-generated ad-free responses help mitigate this issue but introduce additional overhead. These findings highlight the need for future research on designing more effective and efficient solutions for generating ad-injected responses in GEM.

Primary Area: datasets and benchmarks

Submission Number: 9352

Loading