Offline Multi-agent Reinforcement Learning with Sequential Score Decomposition

Dan Qiao; Wenhao Li; Shanchao Yang; Hongyuan Zha; Baoxiang Wang

Offline Multi-agent Reinforcement Learning with Sequential Score Decomposition

Dan Qiao, Wenhao Li, Shanchao Yang, Hongyuan Zha, Baoxiang Wang

25 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent Reinforcement Learning, Offline RL, Diffusion Models

TL;DR: We analyze the failure cases for policy-based offline MARL algorithms and provide an unbiased score function decomposition method with diffusion models in offline MARL as the solution.

Abstract: Offline multi-agent reinforcement learning (MARL) faces significant challenges due to distribution shift issues, exacerbated by the high dimensionality of joint actions and complex joint behavior policy distributions. While existing methods often focus on independent learning or offline value decomposition with conservative value estimation, they may still lead to out-of-distribution (OOD) joint actions and reduced performance. This is primarily due to the lack of exploration opportunity and implicit policy dependencies in offline settings. To address these challenges, we propose an offline policy decomposition method incorporating joint policy regularization constraints. Our approach utilizes a diffusion generative model to capture the joint behavior policy, followed by a decomposition of the extracted score function. This decomposition is then used to regularize individual policies in a decentralized manner. Experimental results demonstrate that our method achieves SOTA on continuous control tasks in standard offline MARL benchmarks.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4169

Loading