Skill-Conditioned Policy Optimization with Successor Features Representations

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Reinforcement Learning, Quality-Diversity
Abstract: A key aspect of intelligence is the ability to exhibit a wide range of behaviors to adapt to unforeseen situations. Designing artificial agents that are capable of showcasing a broad spectrum of skills is a long-standing challenge in Artificial Intelligence. In the last decade, progress in deep Reinforcement Learning (RL) has enabled to solve complex tasks with high-dimensional, continuous state and action spaces. However, most approaches return only one highly-specialized solution to a single problem. We introduce a Skill-Conditioned Optimal Agent (SCOPA) that leverages successor features representations to learn skills that solve a task. We derive a policy skill improvement update with successor features analogous to the classic policy improvement update, that we use to learn skills. From this result, we develop an algorithm that combines successor features with universal function approximators to learn a skill representation that extends the traditional concept of goal to trajectory-based skill. We seamlessly unify value function and successor features policy iteration with constrained optimization to (1) maximize performance while (2) executing a skill. Compared with other skill-conditioned RL methods, SCOPA reaches significantly higher performance and skill space coverage on challenging continuous control locomotion tasks with various types of skills. We also demonstrate that the diversity of skills is useful in downstream adaptation tasks. Videos of our results are available at: http://bit.ly/scopa.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1773
Loading