Projection-Optimal Monotonic Value Function Factorization in Multi-Agent Reinforcement Learning

Yongsheng Mei, Hanhan Zhou, Tian Lan

Published: 01 Jan 2024, Last Modified: 30 Sept 2024AAMAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Value function factorization has emerged as the prevalent method for cooperative multi-agent reinforcement learning under the centralized training and decentralized execution paradigm. Many of these algorithms ensure the coherence between joint and local action selections for decentralized decision-making by factorizing the optimal joint action-value function using a monotonic mixing function of agent utilities. Despite this, utilizing monotonic mixing functions also induces representational limitations, and finding the optimal projection of an unconstrained mixing function onto monotonic function classes remains an open problem. In this paper, we propose QPro, which casts this optimal projection problem for value function factorization as regret minimization over projection weights of different transitions. This optimization problem can be relaxed and solved using the Lagrangian multiplier method to obtain the optimal projection weights in a closed form, where we narrow the gap between optimal and restricted monotonic mixing functions by minimizing the policy regret of expected returns, thereby enhancing the monotonic value function factorization. Our experiments demonstrate the effectiveness of our method, indicating improved performance in environments with non-monotonic value functions.