TL;DR: We provide a formulation for IGM-complete value function decomposition, and develop a novel family of value function decomposition models based on it.
Abstract: Value function decomposition methods for cooperative multi-agent reinforcement learning combine individual per-agent utilities into joint values trained on a joint objective. To ensure consistent action selection between individual utilities and joint values, it is imperative for the composition to satisfy *individual-global max* (IGM). However, most methods that satisfy IGM are characterized by limited representation capabilities that hinder their performance, and the one known exception is unnecessarily convoluted. In this work, we reveal a minimalistic formulation of IGM that inspires the derivation of QFIX, a novel family of value function decomposition methods that expand the representation capabilities of prior methods by means of a small "fixing" network. We implement three variants of QFIX, and demonstrate empirically that QFIX is able to meet or exceed state-of-the-art performance with better stability.
Primary Area: Reinforcement Learning->Multi-agent
Keywords: multi-agent, reinforcement-learning, value-decomposition, cooperative
Submission Number: 4122
Loading