Keywords: unsupervised reinforcement learning, offline reinforcement learning, world models
Abstract: Deep reinforcement learning has proven an effective method to solve many intricate tasks, yet it still struggles in data efficiency and generalization to novel scenarios. Recent approaches to deal with this include (1) unsupervised pretraining of the agent in an environment without reward signals, and (2) training the agent using offline data coming from various possible sources.
In this paper we propose to consider both of these approaches together and argue that this results in a more realistic setting where different types of data are available, and fast online adaptation to new tasks is required.
Towards this goal we extend the Unsupervised RL Benchmark to include access to both unsupervised exploratory data, and offline expert demonstrations, when testing the agents online performance on novel tasks in the environment. Using this setup we solve unaddressed issues in previous work. Specifically, we show how to make unsupervised data more effective by using a reward predictor that is trained from a small amount of supervised offline and online data, and we demonstrate how world models can serve as a way to consolidate agent training from various types of data, leading to faster online adaptation.
Submission Number: 33
Loading