Self-Supervised Structured Representations for Deep Reinforcement LearningDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Reinforcement Learning, Representation Learning, Optical Flow Estimation, Structured representation, Self-supervised Learning
Abstract: Recent reinforcement learning (RL) methods have found extracting high-level features from raw pixels with self-supervised learning to be effective in learning policies. However, these methods focus on learning global representations of images, and disregard local spatial structures present in the consecutively stacked frames. In this paper, we propose a novel approach that learns self-supervised structured representations ($\mathbf{S}^3$R) for effectively encoding such spatial structures in an unsupervised manner. Given the input frames, the structured latent volumes are first generated individually using an encoder, and they are used to capture the change in terms of spatial structures, i.e., flow maps among multiple frames. To be specific, the proposed method establishes flow vectors between two latent volumes via a supervision by the image reconstruction loss. This enables for providing plenty of local samples for training the encoder of deep RL. We further attempt to leverage the structured representations in the self-predictive representations (SPR) method that predicts future representations using the action-conditioned transition model. The proposed method imposes similarity constraints on the three latent volumes; warped query representations by estimated flows, predicted target representations from the transition model, and target representations of future state. Experimental results on complex tasks in Atari Games and DeepMind Control Suite demonstrate that the RL methods are significantly boosted by the proposed self-supervised learning of structured representations. The code is available at https://sites.google.com/view/iclr2022-s3r.
One-sentence Summary: We propose a novel approach that learns self-supervised structured representations for effectively encoding spatial structures in an unsupervised manner.
19 Replies

Loading