Bootstrapped Representations in Reinforcement Learning

Charline Le Lan; Stephen Tu; Mark Rowland; Anna Harutyunyan; Rishabh Agarwal; Marc G Bellemare; Will Dabney

Bootstrapped Representations in Reinforcement Learning

Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G Bellemare, Will Dabney

Published: 03 Mar 2023, Last Modified: 12 Apr 2023RRL 2023 PosterReaders: Everyone

Keywords: reusing pertained representations, auxiliary tasks

TL;DR: We theoretically characterize the representations learnt by pretraining from auxiliary tasks on offline datasets, inform their goodness to linearly predict the value function given any reward function and propose new unsupervised pretraining rules

Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, pretrained representations are often learnt from auxiliary tasks on offline datasets as part of an unsupervised pre-training phase to improve the sample efficiency of deep RL agents in a future online phase. Bootstrapping methods are today's method of choice to make these additional predictions but it is unclear which features are being learned. In this paper, we address this gap and provide a theoretical characterization of the pre-trained representation learnt by temporal difference learning \citep{sutton1988learning}. Surprisingly, we find that this representation differs from the features learned by pre-training with Monte Carlo and residual gradient algorithms for most transition structures of the environment. We describe the goodness of these pre-trained representations to linearly predict the value function given any downstream reward function, and use our theoretical analysis to design new unsupervised pre-training rules. We complement our theoretical results with an empirical comparison of these pre-trained representations for different cumulant functions on the four-room \citep{sutton99between} and Mountain Car \citep{Moore90efficientmemory-based} domains and demonstrate that they speed up online learning.

Track: Technical Paper

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

2 Replies

Loading