Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning

Dingyang Chen; Zhenyu Zhang; Xiaolong Kuang; Xinyang Shen; Ozalp Ozer; Qi Zhang

Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning

Dingyang Chen, Zhenyu Zhang, Xiaolong Kuang, Xinyang Shen, Ozalp Ozer, Qi Zhang

Published: 10 Oct 2024, Last Modified: 01 Nov 2024NeurIPS BDU Workshop 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, multi-agent coordination, Bayesian network

TL;DR: This paper studies the convergence rate of policy gradient methods with action dependencies determined by a Bayesian network.

Abstract: Human coordination often benefits from executing actions in a correlated manner, leading to improved cooperation. This concept holds potential for enhancing cooperative multi-agent reinforcement learning (MARL). Despite this, recent advances in MARL predominantly focus on decentralized execution, which favors scalability by avoiding action correlation among agents. A recent study introduced a Bayesian network to incorporate correlations between agents' action selections within their joint policy, demonstrating global convergence to Nash equilibria under a tabular softmax policy parameterization in cooperative Markov games. In this work, we extend these theoretical results by proving the convergence rate of the Bayesian network joint policy with log-barrier regularization.

Submission Number: 11

Loading