Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Communication Costs

Lin Yang; Xuchuang Wang; Mohammad Hajiesmaili; Lijun Zhang; John C.S. Lui; Don Towsley

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Communication Costs

Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C.S. Lui, Don Towsley

Published: 01 Jun 2024, Last Modified: 20 Jul 2024CoCoMARL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent bandits, communication, regret analysis

TL;DR: A distributed multi-agent bandit algorithm with constant communication guarantee, as well as tight communication lower bound

Abstract: Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. Prior algorithms either cannot achieve constant communication costs or fail to achieve optimal individual regrets. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs. We also provide a tight communication lower bound that matches the constant communication upper bound of our algorithms in all terms, suggesting the optimality of our algorithm design and analysis.

Submission Number: 3

Loading