Skill-Based Reinforcement Learning with Intrinsic Reward Matching

Published: 20 Jun 2024, Last Modified: 07 Aug 2024TAFM@RLC 2024EveryoneRevisionsBibTeXCC BY 4.0
Track Selection: Full paper track.
Keywords: reinforcement learning, unsupervised reinforcement learning, skill discovery
TL;DR: Use the skill discriminator from skill pretraining to understand pretrained policies and use them on new tasks
Abstract: In reinforcement learning (RL), the reward function is a concise yet complete form of task specification. While often used to provide learning supervision to an RL agent, different reward functions can also characterize the varying behaviors in an environment by the optimal policies they induce. In unsupervised reinforcement learning, an agent autonomously learns a family of intrinsic reward functions and corresponding policies with shared latent skill codes. A skill discriminator parametrizes the intrinsic reward function with a neural network where different skill codes correspond to different behaviors. In Intrinsic Reward Matching (IRM), we propose to use this often discarded skill discriminator to understand and use the learned skill policies. Given a downstream task reward function, we use the EPIC reward comparison metric to compare the extrinsic reward function to the skill discriminator-parameterized intrinsic reward function, enabling us to determine which skills correspond to policies that are behaviorally similar to the optimal policys for the new task. We then optimize this metric as a black-box to find the optimal skill and evaluate the skill policy on the downstream task. We demonstrate experimentally that the skill policies IRM selects zero-shot achieve high rewards on the Fetch Tabletop Manipulation and Franka Kitchen domains. Furthermore, we show how IRM can provide insight into the relationships between pretrained skills and downstream tasks.
Submission Number: 2
Loading