Dynamic population-based meta-learning for multi-agent communication with natural language

Abhinav Gupta; Marc Lanctot; Angeliki Lazaridou

Dynamic population-based meta-learning for multi-agent communication with natural language

Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: emergent communication, meta-learning, population-based methods, multi-agent reinforcement learning, few-shot generalization, human-AI coordination

TL;DR: We propose a dynamic population-based meta-learning method to train agents in a cooperative multi-agent communication environment.

Abstract: In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of population-based approaches, where multiple agents interact with each other with the goal of learning more generic protocols. These methods, while able to result in good coordination between unseen partners, still only achieve so in cases of simple languages, thus failing to adapt to human partners using natural language. We attribute this to the use of static populations and instead propose a dynamic population-based meta-learning approach that builds such a population in an iterative manner. We perform a holistic evaluation of our method on two different referential games, and show that our agents outperform all prior work when communicating with seen partners and humans. Furthermore, we analyze the natural language generation skills of our agents, where we find that our agents also outperform strong baselines. Finally, we test the robustness of our agents when communicating with out-of-population agents and carefully test the importance of each component of our method through ablation studies.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

20 Replies

Loading