CaStRL: Context-Aware State Representation learning with Transformer

Jurat Shayiding; Mohamed Zayan; Brian D Ziebart; Xinhua Zhang

CaStRL: Context-Aware State Representation learning with Transformer

Jurat Shayiding, Mohamed Zayan, Brian D Ziebart, Xinhua Zhang

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: representation learning, reinforcement learning, deep learning

TL;DR: this paper aims to learn generalizable state representation with novel unsupervised representaion pretraining approach

Abstract: Learning a versatile representation from high-dimensional observation data is a crucial stepping stone for building autonomous agents capable of effective decision-making in various downstream tasks. Yet, learning such a representa- tion without additional supervisory signals poses formidable practical challenges. In this work, we introduce Context-Aware State Representation Learning (CaStRL), a novel unsupervised representation pre-training approach that combines the strength of generative autoregressive modeling with the pretraining-finetuning paradigm. To encourage CaStRL to grasp the underlying dynamics information of the environment, we enforce it to jointly learn the latent state representation along with the contexts that influence the model’s ability to learn a generalizable representation for control tasks. In CaStRL, we first employ the Video-Swin Transformer as a vision encoder, customizing it to support autoregressive modeling through the incorporation of a causal attention mask. Then, we design Context-GPT to learn context from historical sequences of state representation, which drives the model towards capturing global structural patterns by propagating information across extended time horizons. This significantly improves the adaptability of the learned representation for diverse control tasks. By emphasizing reward-free evaluation and limited data constraints in both pre-training and fine-tuning stages, we find, across a wide range of Atari experiments, that pre- trained representations can substantially facilitate downstream learning efficiency.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4468

Loading