Multi-agent Exploration with Sub-state Entropy Estimation

Jian Tao, Yangkun Chen, Yang Zhang, Kai Yang, Xiu Li

Published: 2024, Last Modified: 16 Dec 2024IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Researchers have integrated exploration techniques into multi-agent reinforcement learning (MARL) algorithms, drawing on their remarkable success in deep reinforcement learning. Nonetheless, exploration in MARL presents a more substantial challenge, as agents need to coordinate their efforts to achieve comprehensive state coverage. Reaching a unanimous agreement on which kinds of states warrant exploring can be a struggle for agents in this context. We introduce Multi-agent Exploration based on Sub-state Entropy (MESE ) to address this limitation. This novel approach incentivizes agents to explore states cooperatively by directing them to achieve consensus via an extra team reward. Calculating the additional reward is based on the novelty of the current sub-state that merits cooperative exploration. MESE employs a conditioned entropy approach to select the sub-state, which uses particle-based entropy estimation to calculate the entropy and uses Random Network Distillation(RND) to calculate the team’s intrinsic reward. MESE is a plug-and-play module that can be seamlessly integrated into most existing MARL algorithms, which makes it a highly effective tool for reinforcement learning. Our experiments demonstrate that MESE can substantially improve the performance of MAPPO and QMIX on various tasks in the StarCraft multi-agent challenge (SMAC).