Keywords: Offline Meta Reinforcement Learning, Prompt Tuning, Transformer
TL;DR: This paper explores how prompts help sequence-modeling based offline-RL algorithms
Abstract: Recently, the pretrain-tuning paradigm in large-scale sequence models has made significant progress in Natural Language Processing and Computer Vision. However, such a paradigm is still hindered by intractable challenges in Reinforcement Learning (RL), including the lack of self-supervised large-scale pretraining methods based on offline data and efficient fine-tuning/prompt-tuning over unseen downstream tasks. In this work, we explore how prompts can help sequence-modeling-based offline Reinforcement Learning (offline-RL) algorithms. Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional generation. As such, we can pretrain a model on the offline dataset with supervised loss and learn a prompt to guide the policy to play the desired actions. Secondly, we extend the framework to the Meta-RL setting and propose Contextual Meta Transformer (CMT), which leverages the context among different tasks as the prompt to improve the performance on unseen tasks. We conduct extensive experiments across three different offline-RL settings: offline single-agent RL on the D4RL dataset, offline Meta-RL on the MuJoCo benchmark, and offline MARL on the SMAC benchmark; the results validate the strong performance, high computation efficiency, and generality of our methods.