MRVF: Multi-Round Value Factorization with Guaranteed Iterative Improvement for Multi-Agent Reinforcement Learning

MRVF: Multi-Round Value Factorization with Guaranteed Iterative Improvement for Multi-Agent Reinforcement Learning

ICLR 2026 Conference Submission15823 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent reinforcement learning, Value Factorization

TL;DR: We propose a theoretical tool for convergence analysis in value factorization and a multi-round factorization framework that converges to the optimal solution.

Abstract: Value factorization restricts the joint action value in a monotonic form to enable efficient search for its optimum. However, the representational limitation of monotonic forms often leads to suboptimal results in cases with highly non-monotonic payoff. Although recent approaches introduce additional conditions on factorization to address the representational limitation, we propose a novel theory for convergence analysis to reveal that single-round factorizations with elaborated conditions are still insufficient for global optimality. To address this issue, we propose a novel Multi-Round Value Factorization (MRVF) framework that refines solutions round by round and finally obtains the global optimum. To achieve this, we measure the non-negative incremental payoff of a solution relative to the preceding solution. This measurement enhances the monotonicity of the payoff and highlights solutions with higher payoff, enabling monotonic factorizations to identify them. We evaluate our method in three challenging environments: non-monotonic one-step games, predator-prey tasks, and StarCraft II Multi-Agent Challenge (SMAC). Experiment results demonstrate that our MRVF outperforms existing value factorization methods, particularly in scenarios highly non-monotonic payoff.

Primary Area: reinforcement learning

Submission Number: 15823

Loading