Haland: Human-AI Coordination via Policy Generation from Language-guided Diffusion

Lei Yuan; Kunmin Lin; Ziqian Zhang; Lihe Li; Feng Chen; Jingyu Ru; Cong Guan; Yang Yu

Haland: Human-AI Coordination via Policy Generation from Language-guided Diffusion

Lei Yuan, Kunmin Lin, Ziqian Zhang, Lihe Li, Feng Chen, Jingyu Ru, Cong Guan, Yang Yu

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-agent Reinforcement Learning, Human-AI Coordination, Coordination and Cooperation, Reinforcement Learning

TL;DR: A novel approach for efficient Human-AI Coordination based on diffusion and language

Abstract:

Developing intelligent agents that can effectively coordinate with diverse human partners is a fundamental goal of artificial general intelligence. Previous approaches typically generate a variety of partners to cover human policies, and then either train a single universal agent or maintain multiple best-response (BR) policies for different partners. However, the first direction struggles with the stochastic and multimodal nature of human behaviors, and the second relies on costly few-shot adaptations during policy deployment, which is unbearable in real-world applications such as healthcare and autonomous driving. Recognizing that human partners can easily articulate their preferences or behavioral styles through natural languages and make conventions beforehand, we propose a framework for Human-AI Coordination via Policy Generation from Language-guided Diffusion, referred to as Haland. Haland first trains BR policies for various partners using reinforcement learning, and then compresses policy parameters into a single latent diffusion model, conditioned on task-relevant language derived from their behaviors. Finally, the alignment between task-relevant and natural languages is achieved to facilitate efficient human-AI coordination. Empirical evaluations across diverse cooperative environments demonstrate that Haland generates agents with significantly enhanced zero-shot coordination performance, utilizing only natural language instructions from various partners, and outperforms existing methods by approximately 89.64%.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6944

Loading