Student First Author: no
Keywords: Reinforcement Learning, Decision Transformers, Contrastive Learning
TL;DR: We present ConDT, a neural architecture for reinforcement learning that empirically outperforms prior work with a novel approach to learning return-dependent transformations of a Decision Transformer's input embeddings.
Abstract: Decision Transformers (DT) have drawn upon the success of Transformers by abstracting Reinforcement Learning as a target-return-conditioned, sequence modeling problem. In our work, we claim that the distribution of DT's target-returns represents a series of different tasks that agents must learn to handle. Work in multi-task learning has shown that separating the representations of input data belonging to different tasks can improve performance. We draw from this approach to construct ConDT (Contrastive Decision Transformer). ConDT leverages an enhanced contrastive loss to train a return-dependent transformation of the input embeddings, which we empirically show clusters these embeddings by their return. We find that ConDT significantly outperforms DT in Open-AI Gym domains by 10% and 39% in visually challenging Atari domains.
Supplementary Material: zip