Multi-agent Reinforcement Learning with Emergent Communication using Discrete and Indifferentiable Message

Hiroto Ebara; Tomoaki Nakamura; Akira Taniguchi; Tadahiro Taniguchi

Multi-agent Reinforcement Learning with Emergent Communication using Discrete and Indifferentiable Message

Hiroto Ebara, Tomoaki Nakamura, Akira Taniguchi, Tadahiro Taniguchi

Published: 01 Jan 2023, Last Modified: 10 Jun 2024IIAI-AAI-Winter 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes an integrated model of multi-agent reinforcement learning with emergent communication based on probabilistic generative models called MASAC- ECo that enables two agents to learn cooperative actions. In this model, agents receive messages as discrete symbols to communicate the state of each agent based on the Metropolis-Hastings naming game (MHNG). Using MHNG, the messages can emerge with-out directly observing another agent's state, and the emerging messages enable each agent to know the other agent's state indirectly. Furthermore, each agent policy is learned using a soft actor-critic, and by utilizing the emerging message as an input of the soft actor-critic, each agent can learn cooperative actions depending on its state and message. In the experiment, we demonstrated that MASAC- ECo can learn cooperative actions. Moreover, the experimental results show that its performance is comparable with that of the conventional method, even though in the training phase, the latter can use the other's state directly, whereas MASAC-ECo cannot.

Loading