Abstract: We consider the problem of optimal repeated user nudging on online platforms when nudge effectiveness
is unknown and repeated use of the same nudge over time reduces its effectiveness. We model the optimal
nudging problem as an online learning problem with K-types (corresponding to different nudge-types), bandit
feedback and non-stationary rewards. Furthermore, our model also incorporates costs of designing new nudges
which are essential to ensure that they remain effective over time. We show that in the full information
setting (when all the model parameters are known), a cyclic policy which regenerates arms of a single
type after a fixed interval is optimal for maximizing the long-run-average-reward. Somewhat surprisingly,
we find that this cyclic policy incurs constant regret (independent of time) even in the finite time setting.
Leveraging ideas from this analysis, we reduce the online learning problem of optimizing repeated nudges
to learning the optimal nudge-type and the corresponding cycle-length and construct a Upper Confidence Bound (UCB) based algorithm that incurs sub-linear regret (O(√T)) which is rate-optimal in this setting.
Numerical experiments based on both synthetic data as well as a model calibrated with real-world data in an
EdTech setting show considerable improvement over benchmark methods and demonstrate the applicability
of the proposed framework.
Loading