Multi-Agent Interpolated Policy Gradients

Yueheng Li; Guangming Xie

Multi-Agent Interpolated Policy Gradients

Yueheng Li, Guangming Xie

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: multi-agent reinforcement learning, stochastic policy gradients, deterministic policy gradients, value function factorization, bias-variance trade-off

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Policy gradient method typically suffers high variance, which is further amplified in the multi-agent setting due to the exponential explosive growth of the joint action space. While value factorization is a popular approach for efficiently reducing the complexity of the value function, integrating it with policy gradient to reduce variance is challenging, as bias is introduced due to the limitations of factorization structure. This paper addresses the underexplored bias-variance trade-off problem by proposing a novel policy gradient method in MARL that uses a convex combination of joint Q-function and a factorized Q-function. This results in a policy gradient approach that balances stochastic and factorized deterministic policy gradients, enabling a more flexible trade-off between bias and variance. Theoretical results validate the effectiveness of our approach, showing that factorized value functions can effectively reduce variance while potentially maintaining low bias. Empirical experiments on several benchmarks demonstrate that our approach outperforms existing state-of-the-art methods in terms of efficiency and stability.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4339

Loading