Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

Sai Rajeswar; Pietro Mazzaglia; Tim Verbelen; Alexandre Piché; Bart Dhoedt; Aaron Courville; Alexandre Lacoste

Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Piché, Bart Dhoedt, Aaron Courville, Alexandre Lacoste

28 May 2022 (modified: 04 May 2025)DARL 2022Readers: Everyone

Keywords: Unsupervised Learning, Model-based Reinforcement Learning, Sample-Efficiency

TL;DR: Our model-based approach combines exploration and planning to efficiently fine-tune unsupervised pre-trained models, thereby closing the performance gap in the Unsupervised RL Benchmark.

Abstract: Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning process. While successful in many circumstances, the approach is typically data-hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific baselines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/unsupervised-model-based-pre-training-for/code)

0 Replies

Loading