Safe Online Bid Optimization with Return On Investment and Budget Constraints

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: advertising, online learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In online marketing, the advertisers' goal is a tradeoff between achieving high volumes and high profitability. The companies business units address this tradeoff by maximizing the volumes while guaranteeing a minimum Return On Investment (ROI) level. Technically speaking, such a task can be naturally modeled as a combinatorial optimization problem subject to ROI and budget constraints that can be solved online. In this picture, the uncertainty over the constraints' parameters plays a crucial role since they can be arbitrarily violated during the learning process due to an uncontrolled algorithms' exploration. Such violations represent a major obstacle to adopting online techniques in real-world applications. Thus, controlling the algorithms' exploration during learning is paramount to making humans trust online learning tools. This paper studies the nature of both optimization and learning problems. In particular, we show that the learning problem is inapproximable within any factor (unless $\textsf{P} = \textsf{NP}$) and provide a pseudo-polynomial-time algorithm to solve its discretized version. Subsequently, we prove that no online learning algorithm can violate the (ROI or budget) constraints a sublinear number of times during the learning process while guaranteeing a sublinear regret. We provide the $\textsf{GCB}$ algorithm that guarantees sublinear regret at the cost of a linear number of constraint violations, and $\textsf{GCB}{safe}$ that guarantees w.h.p. a constant upper bound on the number of constraints violations at the cost of a linear regret. Moreover, we designed $\textsf{GCB}{safe}(\psi,\phi)$, which guarantees both sublinear regret and safety w.h.p. at the cost of accepting tolerances $\psi$ and $\phi$ in the satisfaction of the ROI and budget constraints, respectively. Finally, we provide experimental results to compare the regret and constraint violations of $\textsf{GCB}$ and $\textsf{GCB}{safe}$.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5352
Loading