# Offline Reinforcemnet Learning Dataset Generated by Unsupervised Learning
This repo demonstrates how the unsupervised learning approaches (DIAYN, WURL) generate data, which is offered to downstream offline RL algorithms. The performance of offline RL heavily relies on data diversity. By theoretical derivation, diverse data, namely a broad covery of the state space could facilitate offline RL and reduce the degree of out-of-distribution. Unsupervised RL is capable of creating a set of diverse skills or policies, and a correspondent diverse set of samples. This project utilizes unsupervised RL algorithms to create datasets with higher diversity, which leads to higher performance of the subsequent offline RL alogrithms.

## Backend
DIAYN and WURL are unsupervised RL algorithms with the same backend SAC (Soft Actor-Critic), a maximal-entropy learning approach.

## Requirements
PyTorch 1.2.0
Gym 0.18.0
MuJoCo 131
python 3.7.5
numpy 1.1.7.4
Note: higher torch version may raise errors. Install mujoco-maze (attached in another repo) for PointXXX environments. `cd mujoco-maze && pip install -e .`
