Pre-training as Batch Meta Reinforcement Learning with tiMe

Quan Vuong; Shuang Liu; Minghua Liu; Kamil Ciosek; Hao Su; Henrik Iskov Christensen

Pre-training as Batch Meta Reinforcement Learning with tiMe

Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Hao Su, Henrik Iskov Christensen

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Deep Reinforcement Learning, Meta Reinforcement Learning, Batch Reinforcement Learning, Transfer Learning

TL;DR: Pre-training in RL from purely existing and observational data. Generalization to unseen MDPs.

Abstract: Pre-training is transformative in supervised learning: a large network trained with large and existing datasets can be used as an initialization when learning a new task. Such initialization speeds up convergence and leads to higher performance. In this paper, we seek to understand what the formalization for pre-training from only existing and observational data in Reinforcement Learning (RL) is and whether it is possible. We formulate the setting as Batch Meta Reinforcement Learning. We identify MDP mis-identification to be a central challenge and motivate it with theoretical analysis. Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data. In challenging control tasks and without fine-tuning on unseen MDPs, tiMe is competitive with state-of-the-art model-free RL method trained with hundreds of thousands of environment interactions.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/pre-training-as-batch-meta-reinforcement/code)

Original Pdf: pdf

14 Replies

Loading