everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Representation learning and unsupervised skill discovery can allow robots to acquire diverse and reusable behaviors without the need for task-specific rewards. In this work, we learn a latent representation by maximizing the mutual information between skills and states subject to a distance constraint, using unsupervised reinforcement learning. Our method improves upon prior constrained skill discovery methods by replacing the latent transition maximization with a norm-matching objective. This not only results in a much a richer state space coverage, but allows the robot to learn more stable and easily controllable locomotive behaviors. In robotics this is particularly important, because state transition-maximizing behaviors can result in highly dangerous motions. We successfully deployed the learned policy on a real ANYmal quadruped robot and demonstrated that the robot can accurately reach arbitrary points of the Cartesian state space in a zero-shot manner, using only an intrinsic skill discovery and standard regularization rewards.