Adaptively Coordinating with Novel Partners via Learned Latent Strategies

Benjamin Li; Shuyang Shi; Lucia Romero; Huao Li; Yaqi Xie; Woojun Kim; Stefanos Nikolaidis; Charles Michael Lewis; Katia P. Sycara; Simon Stepputtis

Adaptively Coordinating with Novel Partners via Learned Latent Strategies

Benjamin Li, Shuyang Shi, Lucia Romero, Huao Li, Yaqi Xie, Woojun Kim, Stefanos Nikolaidis, Charles Michael Lewis, Katia P. Sycara, Simon Stepputtis

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, human-ai collaboration, multi-agent systems, zero-shot coordination

Abstract: Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as individuals often have unique preferences and policies that may change dynamically throughout interactions. This becomes particularly challenging in tasks with time pressure and complex strategic spaces, where identifying partner behaviors and selecting suitable responses is difficult. In this work, we introduce a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time. Our approach encodes strategies with a variational autoencoder to learn a latent strategy space from agent trajectory data, identifies distinct strategy types through clustering, and trains a cooperator agent conditioned on these clusters by generating partners of each strategy type. For online adaptation to novel partners, we leverage a fixed-share regret minimization algorithm that dynamically infers and adjusts the partner's strategy estimation during interaction. We evaluate our method in a modified version of the Overcooked domain, a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space. Through these experiments and an online user study, we demonstrate that our proposed agent achieves state of the art performance compared to existing baselines when paired with novel human, and agent teammates.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 25491

Loading