Temporal abstractions-augmented temporally contrastive learning: an alternative to the Laplacian in RLDownload PDF

Published: 28 Jan 2022, Last Modified: 30 May 2024ICLR 2022 SubmittedReaders: Everyone
Keywords: Representation learning, Laplacian, self-supervised, exploration
Abstract: In reinforcement learning (RL), the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from option discovery to dynamics-aware metric learning. Conveniently, learning the Laplacian representation has recently been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large or even continuous state spaces (Wu et al., 2019). However, this approach relies on a uniform access to the state space S, and overlooks the exploration problem that emerges during the representation learning process. In this work, we reconcile such representation learning with exploration in a non-uniform prior setting, while recovering the expressive potential afforded by a uniform prior. Our approach leverages the learned representation to build a skill-based covering policy which in turn provides a better training distribution to extend and refine the representation. We also propose to integrate temporal abstractions captured by the learned skills into the representation, which encourages exploration and improves the representation’s dynamics-awareness. We find that our method scales better to challenging environments, and that the learned skills can solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery methods are limited.
One-sentence Summary: An alternative to Laplacian-like representations in exploration demanding settings.
4 Replies

Loading