Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

David Venuto; Sherry Yang; Pieter Abbeel; Doina Precup; Igor Mordatch; Ofir Nachum

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

Published: 03 Mar 2023, Last Modified: 12 Apr 2023RRL 2023 PosterReaders: Everyone

Abstract: Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a \emph{target} environment of interest with fully-annotated datasets from various other \emph{source} environments. Our method, Action Limited PreTraining (ALPT), leverages the generalization capabilities of inverse dynamics modelling (IDM) to label missing action data in the target environment. We show that utilizing even one additional environment dataset of labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences. We evaluate our method on Atari game-playing environments and show that with target environment data equivalent to only $12$ minutes of gameplay, we can significantly improve game performance and generalization capability compared to other approaches. Furthermore, we show that ALPT remains beneficial even when target and source environments share no common actions, highlighting the importance of pretraining on broad datasets even though they might seem irrelevant to the target task at hand.

Track: Technical Paper

Supplementary Material: zip

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

2 Replies

Loading