Bootstrapped Representations in Reinforcement LearningDownload PDF

Published: 03 Mar 2023, Last Modified: 12 Apr 2023RRL 2023 PosterReaders: Everyone
Keywords: reusing pertained representations, auxiliary tasks
TL;DR: We theoretically characterize the representations learnt by pretraining from auxiliary tasks on offline datasets, inform their goodness to linearly predict the value function given any reward function and propose new unsupervised pretraining rules
Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, pretrained representations are often learnt from auxiliary tasks on offline datasets as part of an unsupervised pre-training phase to improve the sample efficiency of deep RL agents in a future online phase. Bootstrapping methods are today's method of choice to make these additional predictions but it is unclear which features are being learned. In this paper, we address this gap and provide a theoretical characterization of the pre-trained representation learnt by temporal difference learning \citep{sutton1988learning}. Surprisingly, we find that this representation differs from the features learned by pre-training with Monte Carlo and residual gradient algorithms for most transition structures of the environment. We describe the goodness of these pre-trained representations to linearly predict the value function given any downstream reward function, and use our theoretical analysis to design new unsupervised pre-training rules. We complement our theoretical results with an empirical comparison of these pre-trained representations for different cumulant functions on the four-room \citep{sutton99between} and Mountain Car \citep{Moore90efficientmemory-based} domains and demonstrate that they speed up online learning.
Track: Technical Paper
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
2 Replies