Keywords: Cooperative Multi-Agent Reinforcement Learning, Credit Assignment
TL;DR: We propose a cooperative MARL framework with sequential credit assignment (SeCA) that deduces each agent’s contribution to the team's success one by one to learn better cooperation.
Abstract: Centralized training with decentralized execution is a standard paradigm for cooperative multi-agent reinforcement learning (MARL), with credit assignment being a major challenge. In this paper, we propose a cooperative MARL method with sequential credit assignment (SeCA) that deduces each agent's contribution to the team's success one by one to learn better cooperation. We first present a sequential MARL framework, under which we introduce a new counterfactual advantage to evaluate each agent based on its preceding agents' actions in a specific sequence. As this credit assignment sequence tremendously impacts the performance, we further present a sequence adjustment algorithm utilizing integrated gradients. It dynamically modifies the sequence among agents according to their contribution to the team. SeCA employs a network which either estimates the Q value for training the centralized critic or deduces the proposed advantage of each agent for decentralized policy learning. Our method is evaluated on a challenging set of StarCraft II micromanagement tasks and achieves state-of-the-art performance.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
Supplementary Material: zip
11 Replies
Loading