Individual Regret in Cooperative Stochastic Multi-Armed Bandits

Idan Barnea; Tal Lancewicki; Yishay Mansour

Individual Regret in Cooperative Stochastic Multi-Armed Bandits

Idan Barnea, Tal Lancewicki, Yishay Mansour

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MAB, Multi-armed bandit, stochastic MAB, cooperation

TL;DR: We provided individual regret bounds for cooperative stochastic multi-armed bandits over communication graphs, independent of graph diameter, and also analyzed trade-offs with message size and communication rounds.

Abstract: We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We analyzed a variant of Cooperative Successive Elimination algorithm, $\texttt{Coop-SE}$, and show an individual regret bound of ${O}(\mathcal{R} / m + A^2 + A \sqrt{\log T})$ and a nearly matching lower bound. Here $A$ is the number of actions, $T$ the time horizon, $m$ the number of agents, and $\mathcal{R} = \sum_{\Delta_i > 0}\log(T)/\Delta_i$ is the optimal single agent regret, where $\Delta_i$ is the sub-optimality gap of action $i$. Our work is the first to show an individual regret bound in cooperative stochastic MAB that is independent of the graph's diameter. When considering communication networks there are additional considerations beyond regret, such as message size and number of communication rounds. First, we show that our regret bound holds even if we restrict the messages to be of logarithmic size. Second, for logarithmic number of communication rounds, we obtain a regret bound of ${O}(\mathcal{R} / m+A \log T)$.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 9456

Loading