One More Step Towards Reality: Cooperative Bandits with Imperfect CommunicationDownload PDF

21 May 2021, 20:45 (edited 09 Nov 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: multi-armed bandits, multi-agent learning, cooperative bandit, imperfect communication
  • TL;DR: We propose and analyze new algorithms for cooperative multi-agent bandit learning with imperfect communication.
  • Abstract: The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
