Ad-Hoc Human-AI Coordination Challenge

Tin Dizdarević; Ravi Hammond; Tobias Gessler; Anisoara Calinescu; Jonathan Cook; Matteo Gallici; Andrei Lupu; Jakob Nicolaus Foerster

Ad-Hoc Human-AI Coordination Challenge

Tin Dizdarević, Ravi Hammond, Tobias Gessler, Anisoara Calinescu, Jonathan Cook, Matteo Gallici, Andrei Lupu, Jakob Nicolaus Foerster

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.

Lay Summary: Making AI that can smoothly work with humans is a big challenge, especially because testing if an AI is a good human teammate is often costly, inconsistent, and hard to repeat. To tackle this, we created the "Ad-Hoc Human-AI Coordination Challenge" (AH2AC2), using the cooperative card game Hanabi as a testing ground. We trained special AI "stand-ins" (human proxies) on thousands of real human games to act like human players. Researchers can now test their own AI agents by having them play Hanabi with our proxies through a controlled online system, and we provide a small public dataset of human games to help them get started. This new challenge provides a fair, affordable, and repeatable way to see how well AIs can coordinate with human-like partners, aiming to speed up progress in building AI that can truly collaborate with people and learn to do so without needing massive amounts of human data.

Primary Area: Reinforcement Learning->Multi-agent

Keywords: multi-agent reinforcement learning, reinforcement learning, multi-agent systems, human-ai coordination, cooperative, challenge paper

Link To Code: https://github.com/FLAIROx/ah2ac2

Submission Number: 11488

Loading