Abstract: Open ad hoc teamwork presents the challenging problem of designing an autonomous agent that can rapidly adapt to collaborate with teammates without prior coordination in an open environment. Existing methods primarily rely on fixed, predefined teammate types, overlooking the fact that teammates may change dynamically. To address this limitation, we propose a novel reinforcement learning approach, the Open Online Teammate Adaptation Framework (Open-OTAF), which enables a controlled agent to collaborate with dynamic teammates in open ad hoc environments. To achieve this, the controlled agent employs a dual teamwork situation inference model to capture the current teamwork state, facilitating decision-making under partial observability. To handle the dynamic nature of teammate types, we first introduce a Chinese Restaurant Process-based model to categorize diverse teammate policies into distinct clusters, improving the efficiency of identifying teamwork situations. Next, to model heterogeneous agent relationships and accommodate a variable number of teammates, we represent the team as a heterogeneous graph and leverage heterogeneous graph attention neural networks to learn the representation of the teamwork situation. Extensive experiments across four challenging multi-agent benchmark tasks—Level-Based Foraging, Wolf-Pack, Cooperative Navigation, and FortAttack—demonstrate that our method successfully enables dynamic teamwork in open ad hoc settings. Open-OTAF outperforms state-of-the-art methods, achieving superior performance with faster convergence.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Dear Action Editor,
Thank you for your thorough review and the decision to accept our manuscript with minor revisions. We sincerely appreciate your constructive feedback, which has helped improve the clarity and rigor of our work.
1. We have carefully addressed all notation and phrasing errors mentioned by reviewers, especially Reviewer 69MY's points, ensuring consistency and precision throughout the manuscript.
2. Regarding the future work, we added the following statement regarding this issue:`` For future work, we will relax the
CTDE assumption to enhance the applicability of our approach, investigating a robust ad hoc agent within
a fully decentralized framework, and validating the method in real-world applications".
Attached please find the final revised manuscript. If you have any questions, please let me know. Thank you again for your time and valuable input.
Best regards,
The authors
Assigned Action Editor: ~quanming_yao1
Submission Number: 4631
Loading