A Near-optimal High-probability Swap-Regret Upper Bound for Multi-agent Bandits in Unknown General-sum GamesDownload PDF

Published: 08 May 2023, Last Modified: 26 Jun 2023UAI 2023Readers: Everyone
Keywords: Swap regret, correlated equilibrium, high-probability bound
TL;DR: We prove a high-probability bound for the instantaneous swap regret with respect to the randmoness of both learner and adversaries.
Abstract: In this paper, we study a multi-agent bandit problem in an unknown general-sum game repeated for a number of rounds~(i.e., learning in a black-box game with bandit feedback), where a set of agents have no information about the underlying game structure and cannot observe each other's actions and rewards. In each round, each agent needs to play an arm~(i.e., action) from a (possibly different) arm set~(i.e., action set), and \emph{only} receives the reward of the \emph{played} arm that is affected by other agents' actions. The objective of each agent is to minimize her own cumulative swap regret, where the swap regret is a generic performance measure for online learning algorithms. We are the first to give a near-optimal high-probability swap-regret upper bound based on a refined martingale analysis for the exponential-weighting-based algorithms with the implicit exploration technique, which can further bound the expected swap regret instead of the pseudo-regret studied in the literature. It is also guaranteed that correlated equilibria can be achieved in a polynomial number of rounds if the algorithm is played by all agents. Furthermore, we conduct numerical experiments to verify the performance of the studied algorithm.
Supplementary Material: pdf
Other Supplementary Material: zip
0 Replies