Keywords: Federated Bandits, Mutli-armed Bandits, Best-of-both-worlds
Abstract: This paper studies federated multi-armed bandit (MAB) problems in which multiple agents work together to solve a common MAB problem through a communication network. We focus on the heterogeneous setting in which no single agent can identify the globally best arm using only locally biased observations. In this setting, different agents may select the same arm at the same time step, but receive different rewards. We propose a novel algorithm called \textsc{FedFTRL} for this problem and, to our knowledge, it is the first to achieve near-optimal regret guarantees in both stochastic and adversarial environments. Notably, in the adversarial regime, our algorithm achieves $O(T^{\frac{1}{2}})$ regret, a significant improvement over the state-of-the-art regret of $O(T^{\frac{2}{3}})$ \citep{yi2023doubly}. We also provide empirical evaluations comparing our algorithm with baseline methods, demonstrating the effectiveness of our approach on both synthetic and real-world datasets.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 5653
Loading