Keywords: multi-player multi-armed bandits, max-min fariness
Abstract: We consider a multi-player multi-armed bandit problem (MP-MAB) where $N$ players compete for $K$ arms in $T$ rounds. The reward distribution is heterogeneous where each player has a different expected reward for the same arm. When multiple players select the same arm, they collide and obtain zero reward. In this paper, we aim to find the max-min fairness matching that maximizes the reward of the player who receives the lowest reward. This paper improves the existing regret upper bound result of $O(\log T\log \log T)$ to achieve max-min fairness. More specifically, our decentralized fair elimination algorithm (DFE) deals with heterogeneity and collision carefully and attains a regret upper bounded of $O((N^2+K)\log T / \Delta)$, where $\Delta$ is the minimum reward gap between max-min value and sub-optimal arms. We assume $N\leq K$ to guarantee all players can select their arms without collisions. In addition, we also provide an $\Omega(\max\{N^2, K\} \log T / \Delta)$ regret lower bound for this problem. This lower bound indicates that our algorithm is optimal with respect to key parameters, which significantly improves the performance of algorithms in previous work. Numerical experiments again verify the efficiency and improvement of our algorithms.
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2079
Loading