FSV: Learning to Factorize Soft Value Function for Cooperative Multi-Agent Reinforcement LearningDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: cooperative MARL, value function factorization, stochastic policy, continuous tasks
Abstract: We explore energy-based solutions for cooperative multi-agent reinforcement learning (MARL) using the idea of function factorization in centralized training with decentralized execution (CTDE). Existing CTDE based factorization methods are susceptible to the relative overgeneralization, where finding a suboptimal Nash Equilibrium, which is a well-known game-theoretic pathology. To resolve this issue, we propose a novel factorization method for cooperative MARL, named FSV, which learns to factorize the joint soft value function into individual ones for decentralized execution. Theoretical analysis shows that FSV solves a rich class of factorization tasks. Our experiment for the well-known task of the Max of Two Quadratics game shows that FSV fully converges to global optima in the joint action space in the continuous tasks by local searching in the joint action space. We evaluate FSV on a challenging set of StarCraft II micromanagement tasks, and show that FSV significantly outperforms existing factorization multi-agent reinforcement learning methods.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We propose a novel stochastic-based policy solution in CTDE for cooperative MARL by factorizing the soft value function, which significantly outperforms existing factorization MARL methods in discrete and continuous tasks.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=Hx5ET-W05
11 Replies

Loading