# D2C

This is a Pytorch implementation of : "Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement" 


## Setup Instructions
0. Create a conda environment:
```
conda env create -f d2c.yml
conda activate d2c
```


1. Install [pytorch](https://pytorch.org/get-started/locally/) (use tested on pytorch 1.12.1 with CUDA 11.3)

2. Install dependencies:
```
./install.sh
```

3. Set config_path:
see config/paths/template.yaml

4. To run robot arm environment install [metaworld](https://github.com/rlworkgroup/metaworld):
```
pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld
```



### Training and Evaluation (Default)


Ant
```
CUDA_VISIBLE_DEVICES=3 python d2c_train.py env=AntMazeComplex2Way-v0 d2c_kwargs.aux_weight=2 d2c_kwargs.train_every_k=4500 q_clip=true grad_norm_clipping=15.0 grad_value_clipping=-1.0 
```

Point Maze
```
CUDA_VISIBLE_DEVICES=3 python d2c_train.py env=Point2WaySpiralMaze-v0 d2c_kwargs.aux_weight=1 d2c_kwargs.train_every_k=2000 hgg_kwargs.trajectory_pool_kwargs.pool_length=1000 num_train_steps=2000000
```


Sawyer
```
CUDA_VISIBLE_DEVICES=3 python d2c_train.py env=sawyer_peg_pick_and_place d2c_kwargs.aux_weight=1 q_clip=true grad_norm_clipping=15.0 grad_value_clipping=-1.0 normalize_rl_obs=false sawyer_wall_env=true
```

Our code sourced and modified from official implementation of [OUTPACE](https://github.com/jayLEE0301/outpace_official.git), and [HGG](https://github.com/Stilwell-Git/Hindsight-Goal-Generation) Algorithm. Also, we utilize [mujoco-maze](https://github.com/kngwyu/mujoco-maze) and [metaworld](https://github.com/rlworkgroup/metaworld) to validate our proposed method.


