A Trajectory is Worth Three Sentences: Multimodal Transformer for Offline Reinforcement Learning

Yiqi Wang; Mengdi Xu; Laixi Shi; Yuejie Chi

A Trajectory is Worth Three Sentences: Multimodal Transformer for Offline Reinforcement Learning

Yiqi Wang, Mengdi Xu, Laixi Shi, Yuejie Chi

Published: 08 May 2023, Last Modified: 26 Jun 2023UAI 2023Readers: Everyone

Keywords: Transformer, Multimodal, Sequence generation, Offline reinforcement learning

TL;DR: We encourage the community to view transformer-based offline reinforcement learning approach from a multimodal perspective.

Abstract: Transformers hold tremendous promise in solving offline reinforcement learning (RL) by formulating it as a sequence modeling problem inspired by language modeling (LM). Prior works using transformers model a sample (trajectory) of RL as one sequence analogous to a sequence of words (one sentence) in LM, despite the fact that each trajectory includes tokens from three diverse modalities: state, action, and reward, while a sentence contains words only. Rather than taking a modality-agnostic approach which uniformly models the tokens from different modalities as one sequence, we propose a multimodal sequence modeling approach in which a trajectory (one ``sentence'') of three modalities (state, action, reward) is disentangled into three unimodal ones (three ``sentences''). We investigate the correlation of different modalities during sequential decision-making and use the insights to design a multimodal transformer, named Decision Transducer (DTd). DTd outperforms prior art in offline RL on the conducted D4RL benchmarks and enjoys better sample efficiency and algorithm flexibility. Our code is made publicly \href{https://github.com/berniewang8177/Official-codebase-for-Decision-Transducer/}{here}

Supplementary Material: pdf

Other Supplementary Material: zip

0 Replies

Loading