Efficient Human-AI Coordination via Preparatory Language-based Convention

Published: 11 Mar 2024, Last Modified: 22 Apr 2024LLMAgents @ ICLR 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human-AI coordination, Multi-agent Reinforcement Learning, Large Language Model, Coordination and Cooperation
TL;DR: We propose employing the large language model (LLM) to develop an action plan (or equivalently, a convention) that effectively guides both humans and AI.
Abstract: Developing intelligent agents capable of seamless coordination with humans is a critical step towards artificial general intelligence. Existing methods for human-AI coordination typically train an agent to coordinate with a set of diverse policies or with human models fitted from real human data. However, the massive styles of human behavior present obstacles for AI systems with limited capacities, while high quality human data may not be readily available in real-world scenarios. In this study, we observe that prior to coordination, humans engage in communication to establish \textit{conventions} that specify individual roles and actions, making their coordination proceed in an orderly manner. Building upon this observation, we propose employing the large language model (LLM) to develop an action plan (or equivalently, a convention) that effectively guides both human and AI. By inputting task requirements, human preferences, the number of agents, and other pertinent information into the LLM, it can generate a comprehensive convention that facilitates a clear understanding of tasks and responsibilities for all parties involved. Furthermore, we demonstrate that by incorporating human feedback and decomposing the convention formulation problem into sub-problems with \textit{multiple} new sessions being sequentially employed, the LLM will yield a more efficient coordination convention. Experimental evaluations conducted in the \textit{Overcooked-AI} environment, utilizing a human proxy model, highlight the superior performance of our proposed method compared to existing learning-based approaches. When coordinating with real humans, our method achieves better alignment with human preferences and an average performance improvement of 15\% compared to the state-of-the-art.
Submission Number: 18
Loading