Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning
Keywords: multi-agent reinforcement learning, multi-agent coordination, Bayesian network
TL;DR: This paper studies the convergence rate of policy gradient methods with action dependencies determined by a Bayesian network.
Abstract: Human coordination often benefits from executing actions in a correlated manner, leading to improved cooperation. This concept holds potential for enhancing cooperative multi-agent reinforcement learning (MARL). Despite this, recent advances in MARL predominantly focus on decentralized execution, which favors scalability by avoiding action correlation among agents. A recent study introduced a Bayesian network to incorporate correlations between agents' action selections within their joint policy, demonstrating global convergence to Nash equilibria under a tabular softmax policy parameterization in cooperative Markov games. In this work, we extend these theoretical results by proving the convergence rate of the Bayesian network joint policy with log-barrier regularization.
Submission Number: 11
Loading