- Keywords: Reinforcement learning, Variational autoencoder, Disentangled representation learning
- Abstract: Hierarchical reinforcement learning (RL) usually requires task-agnostic and interpretable skills that can be applicable to various downstream tasks. While many recent works have been proposed to learn such skills for a policy in unsupervised manner, the learned skills are still uninterpretable. To alleviate this, we propose a novel WEakly-supervised learning approach for learning Disentangled and Interpretable Skills (WEDIS) from the continuous latent representations of trajectories. We accomplish this by extending a trajectory variational autoencoder (VAE) to impose an inductive bias with weak labels, which explicitly enforces the trajectory representations to be disentangled into factors of interest that we intend the model to learn. Given the latent representations as skills, a skill-based policy network is trained to generate similar trajectories to the learned decoder of the trajectory VAE. Additionally, we propose to train a policy network with single-step transitions and perform the trajectory-level behaviors at test time with the knowledge on the skills, which simplifies the exploration problem in the training. With a sample-efficient planning strategy based on the skills, we demonstrate that our method is effective in solving the hierarchical RL problems in experiments on several challenging navigation tasks with a long horizon and sparse rewards.
- One-sentence Summary: Weakly-supervised learning approach for learning disentangled and interpretable Skills from the continuous latent representations of trajectories.