Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Zhiyu Mou; Yiqin Lv; Miao Xu; Cheems Wang; Yixiu Mao; Jinghao Chen; Qichen Ye; Chao Li; Rongquan Bai; Chuan Yu; Jian Xu; Bo Zheng

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Zhiyu Mou, Yiqin Lv, Miao Xu, Cheems Wang, Yixiu Mao, Jinghao Chen, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: auto-bidding, offline reinforcement learning, generative decision making

Abstract: Auto-bidding serves as a critical tool for advertisers to improve their advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static offline dataset. To address this, we propose AIGB-Pearl (Planning with EvaluAtor via RL), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator for scoring generation quality and designing a provably sound KL-Lipschitz-constrained score maximization scheme to ensure safe and efficient generalization beyond the offline dataset. A practical algorithm incorporating the synchronous coupling technique is further devised to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 24514

Loading