Keywords: Quality-Diversity, Reinforcement Learning
Abstract: A key aspect of intelligence is the ability to exhibit a wide range of behaviors to adapt to unforeseen situations. Designing artificial agents that are capable of showcasing a broad spectrum of skills is a long-standing challenge in Artificial Intelligence. In the last decade, progress in deep reinforcement learning has enabled to solve complex tasks with high-dimensional, continuous state and action spaces. However, most approaches return only one highly-specialized solution to a single problem. We introduce a Skill-Conditioned OPtimal Agent (SCOPA) that leverages successor features representations to learn a continuous range of skills that solve a task. We extend the generalized policy iteration framework with a policy skill improvement update based on successor features that is analogous to the classic policy improvement update. This novel skill improvement update enables to efficiently learn executing skills. From this result, we develop an algorithm that seamlessly unifies value function and successor features policy iteration with constrained optimization to (1) maximize performance, while (2) executing the desired skills. Compared with other skill-conditioned reinforcement learning methods, SCOPA reaches significantly higher performance and skill space coverage on challenging continuous control locomotion tasks with various types of skills. We also demonstrate that the diversity of skills is useful in five downstream adaptation tasks. Videos of our results are available at: https://bit.ly/scopa.
Submission Number: 17
Loading