Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning

Víctor Campos; Pablo Sprechmann; Steven Stenberg Hansen; Andre Barreto; Steven Kapturowski; Alex Vitvitskyi; Adria Puigdomenech Badia; Charles Blundell

Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning

Víctor Campos, Pablo Sprechmann, Steven Stenberg Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adria Puigdomenech Badia, Charles Blundell

21 May 2021 (modified: 05 May 2023)NeurIPS 2021 SubmittedReaders: Everyone

Keywords: deep reinforcement learning, transfer learning, unsupervised learning, exploration

TL;DR: We pre-train agents to maximize an exploration objective at scale in a reward-free setting, and show that the discovered behaviors can be used for efficient adaptation to downstream tasks without transferring neural network weights.

Abstract: Designing agents that acquire knowledge autonomously and use it to solve new tasks efficiently is an important challenge in reinforcement learning. Knowledge acquired during an unsupervised pre-training phase is often transferred by fine-tuning neural network weights once rewards are exposed, as is common practice in supervised domains. Given the nature of the reinforcement learning problem, we argue that standard fine-tuning strategies alone are not enough for efficient transfer in challenging domains. We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights. Our experiments show that, when combined with large-scale pre-training in the absence of rewards, existing intrinsic motivation objectives can lead to the emergence of complex behaviors. These pre-trained policies can then be leveraged by BT to discover better solutions than without pre-training, and combining BT with standard fine-tuning strategies results in additional benefits. The largest gains are generally observed in domains requiring structured exploration, including settings where the behavior of the pre-trained policies is misaligned with the downstream task.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

13 Replies

Loading