Abstract: Real-time bidding (RTB) achieves outstanding success in online display advertising, which has become one of the most influential businesses. Given historical ad impressions under the second price auction mechanism, the advertiser's optimal bidding strategy is determined by the core parameter corresponding to the optimal solution of a constrained optimization problem. However, the sequentially arrived impressions in online display advertising make it highly non-trivial to obtain the optimal core parameter in advance without knowing the complete impression set. For this reason, recent methods have generally transformed the core parameter determination problem into a sequential parameter adjustment problem and solved it using reinforcement learning (RL). This paper proposes a simple and effective Model-Based Automatic Bidding algorithm, MBAB, which explicitly models the uncertainty of the dynamic auction environment and then uses the dynamic programming algorithm to obtain the current optimal adjustment of the core parameter. MBAB can avoid burdensome simulated environment construction and is more suitable for production deployment without the thorny sim-to-real issue than model-free methods. Furthermore, MBAB uses the optimal bidding formula to carry out coarse-grained modeling of the online market environment to alleviate the scalability problem caused by fine-grained environment modeling of previous model-based methods. In order to accurately describe the impression distribution and non-stationarity of the online market environment, we introduce the probabilistic modeling method and propose a novel monotonicity constraint to regulate the model output. Numerical experiments show that the proposed MBAB substantially outperforms existing baselines on various constrained RTB tasks in the production environment.
Loading