Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models

Published: 22 Oct 2024, Last Modified: 30 Oct 2024NeurIPS 2024 Workshop Open-World Agents PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: open-ended learning, self-play, quality-diversity, foundation models, policy search
TL;DR: We propose a novel algorithm, Quality-Diversity Self-Play (QDSP) that explores diverse and high-performing strategies in interacting (here, competing) populations
Abstract: Multi-agent dynamics have powered innovation from time immemorial, such as scientific innovations during the space race or predator-prey dynamics in the natural world. The resulting landscape of interacting agents is a continually changing, interconnected, and complex mosaic of opportunities for innovation. Yet, training innovative and adaptive artificial agents remains challenging. Self-Play algorithms bootstrap the complexity of their solutions by automatically generating a curriculum. Recent work has demonstrated the power of foundation models (FMs) as intelligent and efficient search operators. In this paper, we investigate whether combining the human-like priors and extensive knowledge embedded in FMs with multi-agent race dynamics can lead to rapid policy innovation in open-ended Self-Play algorithms. We propose a novel algorithm, Quality-Diversity Self-Play (QDSP) that explores diverse and high-performing strategies in interacting (here, competing) populations. We evaluate QDSP in a two-player asymmetric pursuer-evader simulation with code-based policies and show that QDSP surpasses high-performing human-designed policies. Furthermore, QDSP discovers better policies than those from quality-only or diversity-only Self-Play algorithms. Since QDSP explores new code-based strategies, the discovered policies come from many distinct subfields of computer science and control, including reinforcement learning, heuristic search, model predictive control, tree search, and machine learning approaches. Combining multi-agent dynamics with the knowledge of FMs demonstrates a powerful new approach to efficiently create a Cambrian explosion of diverse, performant, and complex strategies in multi-agent settings.
Submission Number: 40
Loading