Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition

Yan Wang; KE DENG; Yongli Ren

Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition

Yan Wang, KE DENG, Yongli Ren

16 Sept 2025 (modified: 23 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent Reinforcement Learning, Multi-agent Cross Entropy Method

TL;DR: We propose MCEM with NCD to overcome CDM in cooperative MARL, improving sample efficiency with modified off-policy k-step return, and achieving better results than prior methods on continuous and discrete action benchmarks.

Abstract: Cooperative multi-agent reinforcement learning (MARL) commonly adopts centralized training with decentralized execution (CTDE), where centralized critics leverage global information to guide decentralized actors. However, centralized-decentralized mismatch (CDM) arises when the suboptimal behavior of one agent degrades others’ learning. Prior approaches mitigate CDM through value decomposition, but linear decompositions allow per-agent gradients at the cost of limited expressiveness, while nonlinear decompositions improve representation but require centralized gradients, reintroducing CDM. To overcome this trade-off, we propose the multi-agent cross-entropy method (MCEM), combined with monotonic nonlinear critic decomposition (NCD). MCEM updates policies by increasing the probability of high-value joint actions, thereby excluding suboptimal behaviors. For sample efficiency, we extend off-policy learning with a modified $k$-step return and Retrace. Analysis and experiments demonstrate that MCEM outperforms state-of-the-art methods across both continuous and discrete action benchmarks.

Primary Area: reinforcement learning

Submission Number: 6739

Loading