Keywords: Multi-agent, Reinforcement Learning, Mutual Information, Duality, Policy Gradient, Social Graph
Abstract: The social behavior change in a population has long been studied as an essential component of multi-agent learning. The learning of behavioral change not only involves reinforcement learning (RL), but also be measured against the general population with mutual information (MI). The combination of RL and MI led us to derive MI optimizations from policy gradient. With MI as multi-agent's optimization objective, we discover that the dual properties of MI can result in distinctly different population behaviors. From MI maximization that maximizes the stability of a population to MI minimization that enables specialization among the agents, the dual of MI creates a significant change in a population's behavioral properties. In this paper, we propose a minimax formulation of MI (M\&M) that enables agents specialization with stable regularization. Empirically we evaluated M\&M against the prior SOTA MARL framework, and analyze the social behavior change in performance, diversity, and the stability of their social graphs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
TL;DR: The social behavioral change in population learning is impacted by the dual properties of mutual information.