Learning Explicit Credit Assignment for Multi-agent Joint Q-learning

Hangyu Mao; Jianye HAO; Dong Li; Jun Wang; Weixun Wang; Xiaotian Hao; Bin Wang; Kun Shao; Zhen Xiao; Wulong Liu

Learning Explicit Credit Assignment for Multi-agent Joint Q-learning

Hangyu Mao, Jianye HAO, Dong Li, Jun Wang, Weixun Wang, Xiaotian Hao, Bin Wang, Kun Shao, Zhen Xiao, Wulong Liu

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Mutli-agent Credit Assignment, Mutli-agent Joint Q-learning

Abstract: Multi-agent joint Q-learning based on Centralized Training with Decentralized Execution (CTDE) has become an effective technique for multi-agent cooperation. During centralized training, these methods are essentially addressing the multi-agent credit assignment problem. However, most of the existing methods \emph{implicitly} learn the credit assignment just by ensuring that the joint Q-value satisfies the Bellman optimality equation. In contrast, we formulate an \emph{explicit} credit assignment problem where each agent gives its suggestion about how to weight individual Q-values to explicitly maximize the joint Q-value, besides guaranteeing the Bellman optimality of the joint Q-value. In this way, we can conduct credit assignment among multiple agents and along the time horizon. Theoretically, we give a gradient ascent solution for this problem. Empirically, we instantiate the core idea with deep neural networks and propose Explicit Credit Assignment joint Q-learning (ECAQ) to facilitate multi-agent cooperation in complex problems. Extensive experiments justify that ECAQ achieves interpretable credit assignment and superior performance compared to several advanced baselines.

One-sentence Summary: An explicit credit assignment method for IGM-based joint Q-learning.

29 Replies

Loading