Mitigating Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning
Abstract: Communication can facilitate agents to gain a better understanding of the environment and to coordinate their behaviors in multi-agent deep reinforcement learning (MADRL). However, in certain applications, communication is not available during execution due to factors such as security concerns or limited resources. This paper focuses on a decentralized MADRL setting where communication is used only during training, but not during execution, enabling the learning of coordinated behaviors while keeping decentralized execution. While beneficial, communication can introduce uncertainty, potentially increasing the variance in the learning process of decentralized agents. We conduct the first theoretical analysis to study the variance that is caused by communication in policy gradients using actor-critic methods. Motivated by our theoretical analysis, we propose modular techniques that are designed based on our analytical findings to reduce the variance in policy gradients with communication. We incorporate these techniques into two existing algorithms developed for decentralized MADRL with communication and evaluate them on multiple multi-agent tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results demonstrate that decentralized MADRL communication methods extended with our proposed techniques not only achieve high-performing agents but also reduce variance in policy gradients during training.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The paper was using an incorrect font due to a misconfigured LaTeX package, but the issue has now been resolved.
Assigned Action Editor: ~Yaodong_Yang1
Submission Number: 4704
Loading