EECE: Ensemble-based Epistemic and Cooperative Exploration for Multi-Agent Reinforcement Learning

Jie Hou; Haowen Dou; Lujuan Dang; Chenyang Ge; Badong Chen

EECE: Ensemble-based Epistemic and Cooperative Exploration for Multi-Agent Reinforcement Learning

Jie Hou, Haowen Dou, Lujuan Dang, Chenyang Ge, Badong Chen

17 Sept 2025 (modified: 05 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Exploration, Sparse Rewards, Ensemble Methods, Information-Theoretic Measures, Multi-Agent Reinforcement Learning, MARL

TL;DR: We tackle the challenges of sparse rewards in MARL by proposing an ensemble-based framework with information-theoretic rewards that enhance exploration efficiency and cooperative behavior.

Abstract: Efficient exploration in multi-agent reinforcement learning (MARL) remains a fundamental challenge, particularly in complex cooperative tasks with sparse rewards. In MARL, agents must discover both novel and strongly cooperative state–action pairs in high-dimensional state–action space to effectively facilitate policy learning. In this paper, we propose Ensemble-based Epistemic and Cooperative Exploration (EECE), a unified framework that leverages an ensemble dynamics model to simultaneously capture epistemic uncertainty for directed exploration and the level of cooperation required for coordinated behavior discovery. To achieve this, EECE introduces two information-theoretic intrinsic rewards: (i) an epistemic information gain signal that directs agents toward transitions with high uncertainty, and (ii) a cooperative signal that maximizes the aggregated marginal influence of individual agents on global state variation, quantified via mutual information. It then employs a dynamic weighting strategy to leverage the complementary effects of intrinsic rewards during training. Moreover, it incorporates a dual-policy mechanism that stabilizes exploration and avoids introducing additional non-stationarity and credit assignment issues. We demonstrate the advantages of our method through cooperative benchmarks with sparse rewards, including the StarCraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showing that EECE achieves substantial improvements in both exploration efficiency and final performance.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 8669

Loading