An Investigation into Value-Implicit Pre-training for Task-Agnostic, Sample-Efficient Goal-Conditioned Reinforcement Learning

Published: 03 Nov 2023, Last Modified: 27 Nov 2023GCRL WorkshopEveryoneRevisionsBibTeX
Confirmation: I have read and confirm that at least one author will be attending the workshop in person if the submission is accepted
Keywords: goal-conditioned; reinforcement learning; robotic manipulation; value-implicit pre-training;
Abstract: One of the primary challenges of learning a diverse set of robotic manipulation skills from raw sensory observations is to learn a universal reward function that can be used for unseen tasks. To address this challenge, a recent breakthrough called value-implicit pre-training (VIP) has been proposed. VIP provides a self-supervised pre-trained visual representation that exhibits the capability to generate dense and smooth reward functions for unseen robotic tasks. In this paper, we explore the feasibility of VIP’s goal-conditioned reward specification with the goal of achieving task-agnostic, sample-efficient goal-conditioned reinforcement learning (RL). Our investigation involves an evaluation of online RL by means of VIP-generated rewards instead of human-crafted reward signals on goal-image-specified robotic manipulation tasks from Meta-World under a highly limited interaction. We find the combination of the following three techniques: combining VIP-generated rewards with sparse task-completion rewards, policy pre-training using expert demonstration data via behavior cloning before RL training, and oversampling of the demonstrated data during the RL training, leads to a greater acceleration of online RL compared to utilizing VIP-generated rewards in isolation.
Submission Number: 19