Learning transferable motor skills with hierarchical latent mixture policies

Dushyant Rao; Fereshteh Sadeghi; Leonard Hasenclever; Markus Wulfmeier; Martina Zambelli; Giulia Vezzani; Dhruva Tirumala; Yusuf Aytar; Josh Merel; Nicolas Heess; raia hadsell

Learning transferable motor skills with hierarchical latent mixture policies

Dushyant Rao, Fereshteh Sadeghi, Leonard Hasenclever, Markus Wulfmeier, Martina Zambelli, Giulia Vezzani, Dhruva Tirumala, Yusuf Aytar, Josh Merel, Nicolas Heess, raia hadsell

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SpotlightReaders: Everyone

Keywords: Robotics, Reinforcement Learning, Hierarchical, Latent Variable Models, Skills, Transfer

Abstract: For robots operating in the real world, it is desirable to learn reusable abstract behaviours that can effectively be transferred across numerous tasks and scenarios. We propose an approach to learn skills from data using a hierarchical mixture latent variable model. Our method exploits a multi-level hierarchy of both discrete and continuous latent variables, to model a discrete set of abstract high-level behaviours while allowing for variance in how they are executed. We demonstrate in manipulation domains that the method can effectively cluster offline data into distinct, executable behaviours, while retaining the flexibility of a continuous latent variable model. The resulting skills can be transferred to new tasks, unseen objects, and from state to vision-based policies, yielding significantly better sample efficiency and asymptotic performance compared to existing skill- and imitation-based methods. We also perform further analysis showing how and when the skills are most beneficial: they encourage directed exploration to cover large regions of the state space relevant to the task, making them most effective in challenging sparse-reward settings.

One-sentence Summary: An approach to learn reusable and transferable skills from data via a hierarchical latent mixture policy, which can significantly improve sample efficiency and asymptotic performance on downstream RL tasks

22 Replies

Loading