Boosting Multiagent Reinforcement Learning at High Replay Ratios with Ensemble Reset

16 Sept 2025 (modified: 08 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: MARL; replay ratio; dormant neuron; sample efficiency
TL;DR: We propose EnSet to boost MARL at high replay ratios.
Abstract: Reinforcement learning with a high replay ratio, where the agent's network parameters are updated multiple times per environment interaction, is an emerging way to improve sample efficiency. However, this paradigm remains underexplored in multiagent reinforcement learning (MARL). In this paper, we investigate how to efficiently train MARL at high replay ratios to accelerate learning. Surprisingly, we found that simply increasing the replay ratio leads to severe dormant neurons in the centralized global Q-value network, where neurons become inactive thereby undermining network expressivity and hindering the learning of MARL. To tackle this challenge, we propose Ensemble Reset (EnSet) to boost MARL at high replay ratios from two aspects. First and for the first time, EnSet utilizes an ensemble of global Q-value networks with parameter reset to reduce dormant neurons when updated at a high frequency. Second, EnSet diversifies the replay experience using a multiagent translation invariance prior of the global Q-function to prevent overfitting. Extensive experiments in SMAC, MPE, and SMACv2 show that EnSet substantially speeds up various MARL algorithms at high replay ratios.
Primary Area: reinforcement learning
Submission Number: 7452
Loading