An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination

Published: 21 Sept 2023, Last Modified: 06 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: Zero-Shot Coordination, Human-AI coordination, Training Efficiency, Partner Modeling
TL;DR: We propose an efficient end-to-end training approach for human-AI zero-shot coordination with mixture partner policy and partner modeling module incorporated.
Abstract: The goal of zero-shot human-AI coordination is to develop an agent that can collaborate with humans without relying on human data. Prevailing two-stage population-based methods require a diverse population of mutually distinct policies to simulate diverse human behaviors. The necessity of such populations severely limits their computational efficiency. To address this issue, we propose E3T, an **E**fficient **E**nd-to-**E**nd **T**raining approach for zero-shot human-AI coordination. E3T employs a mixture of ego policy and random policy to construct the partner policy, making it both coordination-skilled and diverse. In this way, the ego agent is end-to-end trained with this mixture policy without the need of a pre-trained population, thus significantly improving the training efficiency. In addition, a partner modeling module is proposed to predict the partner's action from historical information. With the predicted partner's action, the ego policy is able to adapt its policy and take actions accordingly when collaborating with humans of different behavior patterns. Empirical results on the Overcooked environment show that our method significantly improves the training efficiency while preserving comparable or superior performance than the population-based baselines. Demo videos are available at https://sites.google.com/view/e3t-overcooked.
Submission Number: 5934
Loading