# Hindsight Divergence Minimization

This repository provides an implementation of Hindsight Divergence Minimization (HDM),
as proposed in the paper submission,
*Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization*.

If you use this codebase, please cite the anonymous paper submission.

## Setup

Please set up the conda environment with the following packages:
```bash
python==3.7.4
torch==1.10.0
numpy==1.19.1
gym==0.13.1
mujoco_py==2.0.2.13
tensorflow==1.13.1
mpi4py==3.0.3
pandas==1.1.1
joblib==0.16.0
box2d-py==2.3.8
```

Follow the instructions in [Goal-Conditioned Supervised Learning repo](https://github.com/dibyaghosh/gcsl)
to install additional dependencies which include:
 - [MuJoCo physics simulator](https://mujoco.org) (which has been [open-sourced](https://github.com/deepmind/mujoco/releases)), 
 - [multiworld](https://github.com/vitchyr/multiworld),
 - [rlutil](https://github.com/justinjfu/rlutil) for logging utility functions,
 - [robel learning environments](https://github.com/google-research/robel).

Then place the [GCSL folder](https://github.com/dibyaghosh/gcsl) inside this repo:
```bash
 - code
    - gcsl
    - hdm
    - scripts 
``` 

The training scripts are provided in `scripts` folder. 

## Development Notes

The current repo structure looks like the following:

 - `hdm` (Contains our implementation)
   - `agent` (defines the interface for an RL agent and the neural networks)
   - `algo` (defines the steps for environment sampling and training loops)
   - `learn` (defines the optimization procedure)
   - `replay` (defines the replay buffer with hindsight relabeling functionalities)
   - `utils` (utility functions that allow the training code to run on multiple cpu cores and each with multiple threads)

Experiments are logged into an `experiment` folder when the scripts are launched.

## Acknowledgements

This implementation is partially based on the following repos:

 - [OpenAI baselines](https://github.com/openai/baselines)
 - [Goal-Conditioned Supervised Learning](https://github.com/dibyaghosh/gcsl)
 - [PyTorch implementation of HER](https://github.com/TianhongDai/hindsight-experience-replay)
 - [World Model as a Graph](https://github.com/LunjunZhang/world-model-as-a-graph)
