Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

Huanle Xu, Yang Liu, Wing Cheong Lau, Rui Li

Published: 31 Dec 2020, Last Modified: 06 May 2026IJCAI 2020EveryonearXiv.org perpetual, non-exclusive license

Abstract: The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection frac- tion for each individual arm in the long run. Previ- ous works have made substantial advancements in designing efficient online selection solutions, how- ever, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is com- putationally efficient, and more importantly, man- ages to achieve a sublinear regret bound with prob- ability guarantees. Finally, we evaluate the perfor- mance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.