StARformer: Transformer with State-Action-Reward RepresentationsDownload PDF

29 Sept 2021 (modified: 22 Oct 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Representation Learning, Reinforcement Learning, Transformer
Abstract: Reinforcement Learning (RL) can be considered as a sequence modeling task, i.e., given a sequence of past state-action-reward experiences, a model autoregressively predicts a sequence of future actions. Recently, Transformers have been successfully adopted to model this problem. In this work, we propose State-Action-Reward Transformer (StARformer), which explicitly models strongly related local causal relations to help improve action prediction in long sequences. StARformer first extracts local representations (i.e., StAR-representations) from each group of state-action-reward tokens within a very short time span. A sequence of such local representations combined with state representations, is then used to make action predictions over a long time span. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on Atari (image) and Gym (state vector) benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs compared to the baseline. The code will be released online.
One-sentence Summary: A Transformer learning State-Action-Rward-representation to improve for sequence modeling in RL
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2110.06206/code)
6 Replies

Loading