Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Kunal Jha; Wilka Carvalho; Yancheng Liang; Simon Shaolei Du; Max Kleiman-Weiner; Natasha Jaques

Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination

Kunal Jha, Wilka Carvalho, Yancheng Liang, Simon Shaolei Du, Max Kleiman-Weiner, Natasha Jaques

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Learning to generalize to novel coordination tasks with one partner lets you generalize to novel partners as well

Abstract: Zero-shot coordination (ZSC), the ability to adapt to a new partner in a cooperative task, is a critical component of human-compatible AI. While prior work has focused on training agents to cooperate on a single task, these specialized models do not generalize to new tasks, even if they are highly similar. Here, we study how reinforcement learning on a **distribution of environments with a single partner** enables learning general cooperative skills that support ZSC with **many new partners on many new problems**. We introduce *two* Jax-based, procedural generators that create billions of solvable coordination challenges. We develop a new paradigm called **Cross-Environment Cooperation (CEC)**, and show that it outperforms competitive baselines quantitatively and qualitatively when collaborating with real people. Our findings suggest that learning to collaborate across many unique scenarios encourages agents to develop general norms, which prove effective for collaboration with different partners. Together, our results suggest a new route toward designing generalist cooperative agents capable of interacting with humans without requiring human data.

Lay Summary: Imagine we want to create an artificial intelligence (AI) that can work together with people on all sorts of tasks, even tasks it's never seen before. Right now, most AI systems are trained for one specific job. They get really good at it, but if you give them a slightly different task, even a very similar one, they're lost. They can't adapt. We're exploring a different way. Instead of training an AI for a single task, we train it on a wide variety of tasks with just one partner. Think of it like a human learning to play many different sports with the same friend. This helps the AI learn general teamwork skills. To make this possible, we created two special computer programs that can generate billions of unique cooperation challenges. Then, we developed a new training method called Cross-Environment Cooperation (CEC). Our CEC method proved to be much better than other approaches when we tested our AI collaborating with real people. It seems that by learning to cooperate in many different situations, the AI develops a sense of "general norms" or unspoken rules for teamwork. These general norms then allow it to work effectively with completely different partners on brand new problems. Our research suggests a promising path toward creating AI that can genuinely work alongside humans without needing a lot of human-specific training data.

Link To Code: https://github.com/KJha02/crossEnvCooperation

Primary Area: Reinforcement Learning->Multi-agent

Keywords: Zero-shot Coordination, Human-AI Collaboration, Multi-agent Interactions

Submission Number: 12934

Loading