Fixing Value Function Decomposition for Multi-Agent Reinforcement Learning

Andrea Baisero; Rupali Bhati; Shuo Liu; Aathira Sunil Pillai; Christopher Amato

Fixing Value Function Decomposition for Multi-Agent Reinforcement Learning

Andrea Baisero, Rupali Bhati, Shuo Liu, Aathira Sunil Pillai, Christopher Amato

21 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We provide a formulation for IGM-complete value function decomposition, and develop a novel family of value function decomposition models based on it.

Abstract: Value function decomposition methods for cooperative multi-agent reinforcement learning combine individual per-agent utilities into joint values trained on a joint objective. To ensure consistent action selection between individual utilities and joint values, it is imperative for the composition to satisfy *individual-global max* (IGM). However, most methods that satisfy IGM are characterized by limited representation capabilities that hinder their performance, and the one known exception is unnecessarily convoluted. In this work, we reveal a minimalistic formulation of IGM that inspires the derivation of QFIX, a novel family of value function decomposition methods that expand the representation capabilities of prior methods by means of a small "fixing" network. We implement three variants of QFIX, and demonstrate empirically that QFIX is able to meet or exceed state-of-the-art performance with better stability.

Primary Area: Reinforcement Learning->Multi-agent

Keywords: multi-agent, reinforcement-learning, value-decomposition, cooperative

Submission Number: 4122

Loading