An Actor-critic Reinforcement Learning Model for Optimal Bidding in Online Display Advertising

Congde Yuan, Mengzhuo Guo, Chaoneng Xiang, Shuangyang Wang, Guoqing Song, Qingpeng Zhang

Published: 01 Jan 2022, Last Modified: 12 May 2023CIKM 2022Readers: Everyone

Abstract: The real-time bidding (RTB) paradigm allows the advertisers to submit a bid for each impression in online display advertising. A usual demand of the advertisers is to maximize the total value of winning impressions under constraints on some key performance indicators. Unfortunately, the existing RTB research in industrial applications can hardly achieve the optimum due to the stochastic decision scenarios and complex consumer behaviors. In this study, we address the application of RTB to mobile gaming where the in-app purchase action is of high uncertainty, making it challenging to evaluate individual impression opportunities. We first formulate the bidding process into a constrained optimization problem and then propose an actor-critic reinforcement learning (ACRL) model for obtaining the optimal policy under a dynamic decision environment. To avoid feeding too many samples with zero labels to the model, we provide a new way to quantify impression opportunities by integrating the in-app actions, such as conversion and purchase, and the characteristics of the candidate ad inventories. Moreover, the proposed ACRL learns a Gaussian distribution to simulate the audience's decision in a more real bidding scenario by taking additional contextual side information about both media and the audience. We also introduce how to deploy the learned model online to help adjust the final bid. At last, we conduct comprehensive offline experiments to demonstrate the effectiveness of ACRL and carefully set an online A/B testing experiment. The online experimental results verify the efficacy of the proposed ACRL in terms of multiple critical commercial indicators. ACRL has been deployed in the Tencent online display advertising platform and impacts billions of traffic every day. We believe proposed modifications for optimal bidding problems in RTB are practically innovative and can inspire the relative works in this field.

0 Replies