Keywords: unsupervised RL, offline RL, fast adatation, world models
Abstract: Deep reinforcement learning has proven an effective method to solve many intricate tasks, yet it still struggles with data efficiency and generalization to novel scenarios. Recent approaches to deal with this include (1) unsupervised pretraining of the agent in an environment without reward signals, and (2) training the agent using offline data coming from various possible sources.
In this paper we propose to consider both of these approaches together, resulting in a setting where different types of data streams are available and fast online adaptation to new tasks is required.
Towards this goal we consider the Unsupervised Reinforcement Learning Benchmark and show that access to unsupervised data is better used as a source of exploration trajectories rather than for pretraining a policy. Following this observation we develop a method based on training a world-model as a smart offline buffer of exploration data. We show that this approach outperforms previous methods and results in 10-times-faster adaptation.
We then propose a setup that includes access to both unsupervised exploratory data and offline expert demonstrations when testing the agents' online performance on adaptation to novel tasks in the environment.
Submission Number: 73
Loading