A new look at fairness in stochastic multi-armed bandit problemsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Abstract: We study an important variant of the stochastic multi-armed bandit (MAB) problem, which takes fairness into consideration. Instead of directly maximizing cumulative expected reward, we need to balance between the total reward and fairness level. In this paper, we present a new insight in MAB with fairness and formulate the problem in the penalization framework, where rigorous penalized regret can be well defined and more sophisticated regret analysis is possible. Under such a framework, we propose a hard-threshold UCB-like algorithm, which enjoys many merits including asymptotic fairness, nearly optimal regret, better tradeoff between reward and fairness. Both gap-dependent and gap-independent upper bounds have been established. Lower bounds are also given to illustrate the tightness of our theoretical analysis. Numerous experimental results corroborate the theory and show the superiority of our method over other existing methods.
One-sentence Summary: Paper has 24 pages
12 Replies

Loading