Ad Hoc Teamwork via Offline Goal-Based Decision Transformers

Xinzhi Zhang; Hohei Chan; Deheng Ye; Yi Cai; Mengchen Zhao

Ad Hoc Teamwork via Offline Goal-Based Decision Transformers

Xinzhi Zhang, Hohei Chan, Deheng Ye, Yi Cai, Mengchen Zhao

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper frames offline ad hoc teamwork as a sequence modeling problem and proposes goal-based Decision Transformers to train the ego agent for effective collaboration with unknown teammates.

Abstract: The ability of agents to collaborate with previously unknown teammates on the fly, known as ad hoc teamwork (AHT), is crucial in many real-world applications. Existing approaches to AHT require online interactions with the environment and some carefully designed teammates. However, these prerequisites can be infeasible in practice. In this work, we extend the AHT problem to the offline setting, where the policy of the ego agent is directly learned from a multi-agent interaction dataset. We propose a hierarchical sequence modeling framework called TAGET that addresses critical challenges in the offline setting, including limited data, partial observability and online adaptation. The core idea of TAGET is to dynamically predict teammate-aware rewards-to-go and sub-goals, so that the ego agent can adapt to the changes of teammates’ behaviors in real time. Extensive experimental results show that TAGET significantly outperforms existing solutions to AHT in the offline setting.

Lay Summary: This study proposes TAGET, a method that enables agents to learn how to cooperate with unknown teammates using offline data, without requiring real-time interaction. By predicting team-level goals, the agent is able to adapt dynamically to changing teammate behaviors, achieving strong performance in various cooperative tasks and showing potential for real-world applications such as autonomous driving and disaster response.

Primary Area: Reinforcement Learning->Multi-agent

Keywords: Ad Hoc Teamwork, Offline Reinforcement Learning

Submission Number: 3677

Loading