Keywords: ad hoc teamwork, robust reinforcement learning, multi-agent reinforcement learning
TL;DR: We study worst-case priors over partners as a mean to achieve robust ad hoc teamwork in coordination tasks.
Abstract: Learning policies for Ad Hoc Teamwork (AHT) is challenging. Most standard methods choose a specific distribution over training partners, which is assumed to mirror the distribution over partners after deployment. Moreover, they offer limited guarantees over worst-case performance. To tackle the issue, we propose using a worst-case prior distribution by adapting ideas from minimax-Bayes analysis to AHT.
We thereby explicitly account for our uncertainty about the partners at test time. Extensive experiments, including evaluations on coordination tasks from the Melting Pot suite, show our method's superior robustness compared to self-play, fictitious play, and best response learning w.r.t. policy populations. This highlights the importance of selecting an appropriate training distribution over teammates to achieve robustness in AHT.
Submission Number: 110
Loading