Weakly-Supervised Learning of Disentangled and Interpretable Skills for Hierarchical Reinforcement Learning

Wonil Song; Sangryul Jeon; Hyesong Choi; Kwanghoon Sohn; Dongbo Min

Weakly-Supervised Learning of Disentangled and Interpretable Skills for Hierarchical Reinforcement Learning

Wonil Song, Sangryul Jeon, Hyesong Choi, Kwanghoon Sohn, Dongbo Min

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Reinforcement learning, Variational autoencoder, Disentangled representation learning

Abstract: Hierarchical reinforcement learning (RL) usually requires task-agnostic and interpretable skills that can be applicable to various downstream tasks. While many recent works have been proposed to learn such skills for a policy in unsupervised manner, the learned skills are still uninterpretable. To alleviate this, we propose a novel WEakly-supervised learning approach for learning Disentangled and Interpretable Skills (WEDIS) from the continuous latent representations of trajectories. We accomplish this by extending a trajectory variational autoencoder (VAE) to impose an inductive bias with weak labels, which explicitly enforces the trajectory representations to be disentangled into factors of interest that we intend the model to learn. Given the latent representations as skills, a skill-based policy network is trained to generate similar trajectories to the learned decoder of the trajectory VAE. Additionally, we propose to train a policy network with single-step transitions and perform the trajectory-level behaviors at test time with the knowledge on the skills, which simplifies the exploration problem in the training. With a sample-efficient planning strategy based on the skills, we demonstrate that our method is effective in solving the hierarchical RL problems in experiments on several challenging navigation tasks with a long horizon and sparse rewards.

One-sentence Summary: Weakly-supervised learning approach for learning disentangled and interpretable Skills from the continuous latent representations of trajectories.

13 Replies

Loading