URLB: Unsupervised Reinforcement Learning BenchmarkDownload PDF

Aug 20, 2021 (edited Aug 27, 2021)NeurIPS 2021 Datasets and Benchmarks Track (Round 2)Readers: Everyone
  • Keywords: unsupervised learning, reinforcement learning, benchmark, open-source code
  • TL;DR: We present a benchmark for Unsupervised Reinforcement Learning, open-source code for eight leading unsupervised RL methods, standardize pre-training & evaluation, and benchmark across twelve downstream tasks.
  • Abstract: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.
  • Supplementary Material: zip
  • URL: https://anonymous.4open.science/r/urlb
20 Replies