Consistent epistemic planning for multiagent deep reinforcement learning

Published: 01 Jan 2024, Last Modified: 11 Nov 2024Int. J. Mach. Learn. Cybern. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multiagent cooperation in a partially observable environment without communication is difficult because of the uncertainty of agents. Traditional multiagent deep reinforcement learning (MADRL) algorithms fail to address this uncertainty. We proposed a MADRL-based policy network architecture called shared mental model-multiagent epistemic planning policy (SMM-MEPP) to resolve this issue. Firstly, this architecture combines multiagent epistemic planning and MADRL to create a “perception–planning–action” multiagent epistemic planning framework, helping multiple agents better handle uncertainty in the absence of coordination. Additionally, by introducing mental models and describing them as neural networks, the parameter-sharing mechanism is used to create shared mental models, maintain the consistency of multiagent planning under the condition of no communication, and improve the efficiency of cooperation. Finally, we applied the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG, and MAPPO) and conducted comparative experiments in multiagent cooperation tasks. The results show that the proposed method can provide consistent planning for multiple agents and improve the convergence speed or training effect in a partially observable environment without communication.
Loading