Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal ReachingDownload PDF

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone
Keywords: unsupervised reinforcement learning, skill discovery, mutual information
Abstract: Learning meaningful behaviors in the absence of a task-specific reward function is a challenging problem in reinforcement learning. A desirable unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. At test time, an agent could then leverage these skills to solve sparse reward problems by performing efficient exploration and finding an effective goal-directed policy with little-to-no additional learning. Unfortunately, it is challenging to learn skills with such properties, as diffusing (e.g., stochastic policies performing good coverage) skills are not reliable in targeting specific states, whereas directed (e.g., goal-based policies) skills provide limited coverage. In this paper, inspired by the mutual information framework, we propose a novel algorithm designed to maximize coverage while ensuring a constraint on the directedness of each skill. In particular, we design skills with a decoupled policy structure, with a first part trained to be directed and a second diffusing part that ensures local coverage. Furthermore, we leverage the directedness constraint to adaptively add or remove skills as well as incrementally compose them along a tree that is grown to achieve a thorough coverage of the environment. We illustrate how our learned skills enables to efficiently solve sparse-reward downstream tasks in navigation and continuous control environments, where it compares favorably with existing baselines.
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
TL;DR: We propose a new unsupervised RL algorithm that discovers and composes skills to maximize state-space coverage while ensuring their directedness. It can cover hard-to-explore environments and few-shot learn goal-reaching policies on downstream tasks.
Supplementary Material: pdf
17 Replies

Loading