Decentralized Learning Dynamics in the Gossip Model

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX
Keywords: decentralized learning, multi-armed bandits, multiplicative weights, distributed gossip model
Abstract: We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm’s (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are *decentralized*: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of *“zero-sum” multiplicative weight update* algorithms, and we develop a general framework for analyzing the population-level regret of these natural protocols. Using this framework, we derive sublinear regret bounds under a wide range of parameter regimes (i.e., the size of $m$ and $n$) in an adversarial reward setting (where the mean of each arm's distribution can vary over time), when the number of rounds $T$ is at most logarithmic in $n$. Further, we show that these protocols can approximately optimize convex functions over the simplex when the reward distributions are generated from a stochastic gradient oracle.
Submission Number: 94
Loading