LLM-Assisted Semantically Diverse Teammate Generation for Efficient Multi-agent Coordination

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We utilize LLMs to propose novel coordination behaviors described in natural language, and then transform them into teammate policies, enhancing teammate diversity and interpretability.
Abstract: Training with diverse teammates is the key for learning generalizable agents. Typical approaches aim to generate diverse teammates by utilizing techniques like randomization, designing regularization terms, or reducing policy compatibility, etc. However, such teammates lack semantic information, resulting in inefficient teammate generation and poor adaptability of the agents. To tackle these challenges, we propose Semantically Diverse Teammate Generation (SemDiv), a novel framework leveraging the capabilities of large language models (LLMs) to discover and learn diverse coordination behaviors at the semantic level. In each iteration, SemDiv first generates a novel coordination behavior described in natural language, then translates it into a reward function to train a teammate policy. Once the policy is verified to be meaningful, novel, and aligned with the behavior, the agents train a policy for coordination. Through this iterative process, SemDiv efficiently generates a diverse set of semantically grounded teammates, enabling agents to develop specialized policies, and select the most suitable ones through language-based reasoning to adapt to unseen teammates. Experiments show that SemDiv generates teammates covering a wide range of coordination behaviors, including those unreachable by baseline methods. Evaluation across four MARL environments, each with five unseen representative teammates, demonstrates SemDiv's superior coordination and adaptability. Our code is available at https://github.com/lilh76/SemDiv.
Lay Summary: How can we train learning agents to effectively generalize and adapt to unseen teammates? We address this question by introducing a novel framework called Semantically Diverse Teammate Generation (SemDiv), which leverages large language models (LLMs) to discover and learn diverse coordination behaviors at the semantic level. Our paper presents two key findings. First, we demonstrate that traditional approaches to teammate diversity—such as randomization, regularization, or policy compatibility reduction—often fail to produce semantically meaningful behaviors, limiting both the efficiency of teammate generation and the adaptability of trained agents. This is surprising because prior work assumed that procedural diversity (e.g., varying policy parameters) was sufficient for generalization. Second, we show that by grounding teammate generation in natural language descriptions of coordination behaviors—and then translating these into reward functions—SemDiv produces a richer, more interpretable set of teammates than previously possible. Our results have implications for how we design multi-agent learning systems, suggesting that semantic diversity, rather than purely algorithmic variation, is crucial for developing agents that can reason about and adapt to novel partners. Experiments across four multi-agent environments confirm that SemDiv generates a broader range of coordination behaviors than baseline methods, enabling superior generalization to unseen teammates.
Link To Code: https://github.com/lilh76/SemDiv
Primary Area: Reinforcement Learning->Multi-agent
Keywords: Multi-agent Reinforcement Learning, Large Language Models, Multi-agent Coordination
Submission Number: 4527
Loading