Learning Fair And Effective Points-Based Rewards Programs

Chamsi Hssaine; Yichun Hu; Ciara Pike-Burke

Learning Fair And Effective Points-Based Rewards Programs

Chamsi Hssaine, Yichun Hu, Ciara Pike-Burke

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0

Keywords: revenue management, rewards programs, fairness, online learning

Abstract: Points-based rewards programs are a prevalent way to incentivize customer loyalty. In these programs, customers who make repeated purchases from a seller accumulate points, working toward eventual redemption of a free reward. While they can generate significant revenue gains for the seller if implemented correctly, these programs have come under scrutiny of late due to accusations of unfair practices in their implementation. Motivated by these real-world concerns, this paper studies the problem of {\it fairly} designing points-based rewards programs, with a special focus on two major obstacles that put fairness at odds with their effectiveness: (i) the incentive to exploit customer heterogeneity by personalizing programs to customers' purchase behavior, and (ii) risks of devaluing customers' previously earned points when sellers need to experiment in uncertain environments. To study this problem, we focus on the popular "Buy $N$, Get One Free" (BNGO) rewards programs. We first show that the optimal \emph{individually fair} program that uses the same redemption threshold for all customers suffers from a constant factor loss in revenue of at most $1+\ln 2$, compared to the optimal personalized strategy which may unfairly offer different customers different thresholds. We then tackle the problem of designing {\it temporally fair} learning algorithms in the presence of demand uncertainty. Toward this goal, we design a "stable" learning algorithm that limits the risk of point devaluation due to experimentation by only changing the redemption threshold $O(\log T)$ times, over a learning horizon of length $T$. We prove that this algorithm incurs $\widetilde{O}(\sqrt{T})$ regret in expectation; this guarantee is optimal, up to polylogarithmic factors. We then modify this algorithm to ever only decrease redemption thresholds, leading to improved fairness at a cost of only a constant factor in regret. Finally, we conduct extensive numerical experiments to show the limited value of personalization in average-case settings, in addition to demonstrating the strong practical performance of our proposed learning algorithms.

Submission Number: 33

Loading