Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

ICLR 2026 Conference Submission21736 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent, reinforcement-learning, value-function-decomposition, cooperative, dec-pomdps
TL;DR: We provide a simple formulation for IGM-complete value function decomposition, and develop a novel family of value function decomposition models based on it.
Abstract: Value function decomposition methods for cooperative multi-agent reinforcement learning compose joint values from individual per-agent utilities, and train them using a joint objective. To ensure that the action selection process between individual utilities and joint values remains consistent, it is imperative for the composition to satisfy the individual-global max (IGM) property. Although satisfying IGM itself is straightforward, most existing methods (e.g., VDN, QMIX) have limited representation capabilities and are unable to represent the full class of IGM values, and the one exception that has no such limitation (QPLEX) is unnecessarily complex. In this work, we present a simple formulation of the full class of IGM values that naturally leads to the derivation of QFIX, a novel family of value function decomposition models that expand the representation capabilities of prior models via a thin "fixing" layer. We derive multiple variants of QFIX, and imple- ment three variants in two well-known multi-agent frameworks. We perform an empirical evaluation on multiple SMACv2 and Overcooked environments, which confirms that QFIX (i) succeeds in enhancing the performance of prior methods, (ii) learns more stably and performs better than its main competitor QPLEX, and (iii) achieves this while employing the simplest and smallest mixing models.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 21736
Loading