RACCOON: Regret-based Adaptive Curricula for Cooperation

Published: 01 Jun 2024, Last Modified: 25 Jul 2024CoCoMARL 2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-agent reinforcement learning, unsupervised environment design, ad-hoc teamwork, autocurricula, zero-shot coordination
Abstract: Overfitting to training partners is a common problem in cooperative multi-agent reinforcement learning, leading to poor zero-shot transfer to novel partners. A popular solution is to train an agent with a diverse population of partners. However, previous work lacks a principled approach for selecting partners from this population during training, usually sampling at random. We argue that partner sampling is an important and overlooked problem, and motivated by the success of regret-based Unsupervised Environment Design, we propose Regret-based Adaptive Curricula for Cooperation (RACCOON), which prioritises high-regret partners and tasks. We test RACCOON in the Overcooked environment, and demonstrate that it leads to increased robustness and sample efficiency gains. We further analyse the nature of the induced curricula, and conclude with discussions on the limitations of cooperative regret and directions for future work.
Submission Number: 21
Loading