Online Learning for Repeated Nudging

Anand Kalvit, Divya Singhvi

Published: 08 Jul 2025, Last Modified: 12 Jul 2025EC 2025EveryoneCC BY 4.0

Abstract: We consider the problem of optimal repeated user nudging on online platforms when nudge effectiveness is unknown and repeated use of the same nudge over time reduces its effectiveness. We model the optimal nudging problem as an online learning problem with K-types (corresponding to different nudge-types), bandit feedback and non-stationary rewards. Furthermore, our model also incorporates costs of designing new nudges which are essential to ensure that they remain effective over time. We show that in the full information setting (when all the model parameters are known), a cyclic policy which regenerates arms of a single type after a fixed interval is optimal for maximizing the long-run-average-reward. Somewhat surprisingly, we find that this cyclic policy incurs constant regret (independent of time) even in the finite time setting. Leveraging ideas from this analysis, we reduce the online learning problem of optimizing repeated nudges to learning the optimal nudge-type and the corresponding cycle-length and construct a Upper Confidence Bound (UCB) based algorithm that incurs sub-linear regret (O(√T)) which is rate-optimal in this setting. Numerical experiments based on both synthetic data as well as a model calibrated with real-world data in an EdTech setting show considerable improvement over benchmark methods and demonstrate the applicability of the proposed framework.