Collaborative Regret Minimization for Piecewise-Stationary Multi-Armed Bandit

Published: 01 Jan 2023, Last Modified: 13 May 2025EUSIPCO 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We study a structured multi-agent multi-armed bandit (MAB) problem in a non-stationary environment. Agents in the system face the same piecewise-stationary MAB problem. Consequently, they share information so far allowed by the graph links to accelerate learning. Each agent aims at minimizing the regret of sequential decision-making, which is the expected total loss of not playing the optimal arm at each time step. We propose a solution to that problem, RBO-Coop-UCB, which involves an efficient multi-agent UCB algorithm with a Bayesian change point detector as its core, enhanced by a collaboration mechanism for performance improvement. Theoretically, we establish an upper bound for the expected group regret of RBO-Coop-UCB. Numerical experiments on real-world datasets demonstrate that our proposed method outperforms the state-of-the-art algorithms.
Loading