DisTop: Discovering a Topological representation to learn diverse and rewarding skillsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Hierarchical reinforcement learning, Representation learning, Developmental learning, Reinforcement learning
Abstract: An efficient way for a deep reinforcement learning agent to explore can be to learn a set of skills that achieves a uniform distribution of terminal states. Following this, we introduce DisTop, a new model that simultaneously learns diverse skills and focuses on improving rewarding skills. DisTop progressively builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy. Using this topology, a state-independent hierarchical policy can select where the agent has to keep discovering skills in the state space. In turn, the new set of visited states allows an improved learnt representation. If the agent gets overloaded by the number of skills, the agent can autonomously forget the skills unrelated to its eventual task. Our experiments emphasize that DisTop is agnostic to the ground state representation and that the agent can discover the topology of its environment whether the states are high-dimensional binary data, images, or proprioceptive inputs. We demonstrate that this paradigm is competitive on MuJoCo benchmarks with state-of-the-art algorithms on both single-task dense rewards and diverse skill discovery without rewards. By combining these two aspects, we show that DisTop outperforms a state-of-the-art hierarchical reinforcement learning algorithm when rewards are sparse. We believe DisTop opens new perspectives by showing that bottom-up skill discovery combined with representation learning can tackle different complex state spaces and reward settings when it is endowed with the ability to explicitly select the skills to improve.
One-sentence Summary: DisTop is a new reinforcement learning based agent that learns skills that maximize both the embedded state entropy and extrinsic rewards by discovering the topology of its environment.
Supplementary Material: zip
15 Replies

Loading