Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

Haoran Zhang; Xuchuang Wang; Hao-Xu Chen; Hao Qiu; Lin Yang; Yang Gao

Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

Haoran Zhang, Xuchuang Wang, Hao-Xu Chen, Hao Qiu, Lin Yang, Yang Gao

Published: 07 May 2025, Last Modified: 28 Jul 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: federated multi-armed bandit, consensus estimation subroutine

Abstract: In this paper, we focus on the research of federated multi-armed bandit (FMAB) problems where agents can only communicate with their neighbors. All agents aim to solve a common multi-armed bandit~(MAB) problem to minimize individual regrets, while group regret can also be minimized. In a federated bandit problem, an agent fails to estimate the global reward means of arms by only using local observations, and hence, the bandit learning algorithm usually adopts a consensus estimation strategy to address the heterogeneity. However, up to now, the existing algorithms with fully distributed communication graphs only achieved a suboptimal result for the problem. To address that, a fully distributed online consensus estimation algorithm (\texttt{CES}) is proposed to estimate the global mean without bias. Integrating this consensus estimator into a distributed successive elimination bandit algorithm framework yields our federated bandit algorithm. Our algorithm significantly improves both individual and group regrets over previous approaches, and we provide an in-depth analysis of the lower bound for this problem.

Supplementary Material: zip

Latex Source Code: zip

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission99/Authors, auai.org/UAI/2025/Conference/Submission99/Reproducibility_Reviewers

Submission Number: 99

Loading