# Unsupervised-to-Online RL (U2O RL)

## Requirements
* Python 3.8
* MuJoCo 2.1.0

## Installation
```
conda create --name u2o_gcrl python=3.8
conda activate u2o_gcrl
pip install -r requirements.txt --no-deps
pip install "jax[cuda11_cudnn82]==0.4.3" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

## Examples
```
# U2O RL on antmaze-large-diverse
python offline_online_u2o.py --run_group EXP --agent_name hilp2iql --algo_name hilp2iql --seed 0 --env_name antmaze-large-diverse-v2 --skill_temperature 10 --skill_expectile 0.9 --num_pretraining_steps 1000000 --save_interval 1000000 --max_steps 1000000

# U2O RL on antmaze-ultra-diverse
python offline_online_u2o.py --run_group EXP --agent_name hilp2iql --algo_name hilp2iql --seed 0 --env_name antmaze-ultra-diverse-v0 --skill_temperature 10 --skill_expectile 0.9 --num_pretraining_steps 1000000 --save_interval 1000000 --max_steps 1000000

# U2O RL on kitchen-partial
python offline_online_u2o.py --run_group EXP --agent_name hilp2iql --algo_name hilp2iql --seed 0 --env_name kitchen-partial-v0 --skill_temperature 0.5 --skill_expectile 0.7 --num_pretraining_steps 500000 --save_interval 500000 --max_steps 500000

# U2O RL on visual-kitchen-partial
mkdir -p data/d4rl_kitchen_rendered
python dataset_render.py --env_name kitchen-partial-v0
python offline_online_u2o.py --run_group EXP --agent_name hilp2iql --algo_name hilp2iql --seed 0 --env_name visual-kitchen-partial-v0 --skill_temperature 0.5 --skill_expectile 0.7 --encoder impala_small --p_aug 0.5 --num_pretraining_steps 500000 --save_interval 500000 --max_steps 500000
```

## License

MIT