Keywords: online learning, mechanism design, incentive-aware learning, multi-armed bandits, incentive design
TL;DR: We introduce the strategic click-bandit problem and design an incentive-aware learning algorithm that induces desirable arm strategies while simultaneously minimizing regret.
Abstract: We study a strategic variant of the multi-armed bandit problem, which we coin the \emph{strategic click-bandit}. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-through rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) aligning incentives by incentivizing desirable arm actions under uncertainty; (b) learning unknown parameters. We approximately characterize all Nash equilibria among arms under UCB-S and show a $\tilde O(\sqrt{KT})$ regret bound in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit setup.
Submission Number: 43
Loading