- Keywords: multiagent reinforcement learning
- Abstract: There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among multiple agents such as self-driving cars or autonomous delivery drones. Real-world conditions are a challenging environment for multiagent systems due to the environment's partially observable, nonstationary nature. Moreover, if agents must share a limited resource (e.g., communication network bandwidth) they must all learn how to coordinate resource use. These aspects make learning very challenging. This paper introduces a deep recurrent multiagent actor-critic framework for handling multiagent coordination under partial observable settings and limited communication. We investigate the recurrency effects on the performance and communication use of a team of agents, and demonstrate that the resulting framework is capable of learning time-dependencies for not only sharing missing observations but also handling resource limitations. It gives rise to different communication patterns among agents, which still perform equivalently well as current multiagent actor-critic methods under fully observable settings.