Keywords: multi-player multi-armed bandits, delayed feedback
Abstract: Multi-player multi-armed bandits have been researched for a long time due to their application in cognitive radio networks. In this setting, multiple players select arms at each time and instantly receive the feedback. Most research on this problem focuses on the content of the immediate feedback, whether it includes both the reward and collision information or the reward alone. However, delay is common in cognitive networks when users perform spectrum sensing. In this paper, we design an algorithm DDSE (Decentralized Delayed Successive Elimination) in multi-player multi-armed bandits with stochastic delay feedback and establish a regret bound. Compared with existing algorithms that fail to address this problem, our algorithm enables players to adapt to delayed feedback and avoid collision. We also derive a lower bound in centralized setting to prove the algorithm achieves near-optimal. Numerical experiments on both synthetic and real-world datasets validate the effectiveness of our algorithm.
Supplementary Material: zip
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2083
Loading