# Unsupervised-to-Online RL (U2O RL)

## Requirements
* Python 3.8

## Installation
```
conda create --name=u2o_zsrl python=3.8
conda activate u2o_zsrl
pip install -r requirements.txt
```

## Examples

### State-Based ExORL
* Download the ExORL datasets following the instructions at https://github.com/denisyarats/exorl.
* Convert the dataset using `convert.py`:
```
# Covert RND Walker 
python convert.py --save_path=PATH_TO_SAVE --env=walker --task=run --method=rnd --num_episodes=5000 --use_pixels=0
# Convert RND Cheetah 
python convert.py --save_path=PATH_TO_SAVE --env=cheetah --task=run --method=rnd --num_episodes=5000 --use_pixels=0
# Convert RND Quadruped
python convert.py --save_path=PATH_TO_SAVE --env=quadruped --task=run --method=rnd --num_episodes=5000 --use_pixels=0
# Convert RND Jaco
python convert.py --save_path=PATH_TO_SAVE --env=jaco --task=reach_top_left --method=rnd --num_episodes=20000 --use_pixels=0
```
* Train policies:
```
# HILP on RND Walker
PYTHONPATH=. python url_benchmark/train_offline_online.py run_group=EXP device=cuda agent=sf agent.feature_learner=hilp p_randomgoal=0.375 agent.hilp_expectile=0.5 agent.hilp_discount=0.96 agent.q_loss=True agent.mix_ratio=0 seed=0 task=walker_run expl_agent=rnd load_replay_buffer=PATH_TO_DATASET/datasets/walker/rnd/replay.pt replay_buffer_episodes=5000 num_grad_pretrain_steps=1000000 num_grad_finetune_steps=1000000 agent.use_rew_norm=True agent.preprocess=True experiment=hilp
# HILP on RND Cheetah
PYTHONPATH=. python url_benchmark/train_offline_online.py run_group=EXP device=cuda agent=sf agent.feature_learner=hilp p_randomgoal=0.375 agent.hilp_expectile=0.5 agent.hilp_discount=0.98 agent.q_loss=True agent.mix_ratio=0 seed=0 task=cheetah_run expl_agent=rnd load_replay_buffer=PATH_TO_DATASET/datasets/cheetah/rnd/replay.pt replay_buffer_episodes=5000 num_grad_pretrain_steps=1000000 num_grad_finetune_steps=1000000 agent.use_rew_norm=True agent.preprocess=True experiment=hilp
# HILP on RND Quadruped
PYTHONPATH=. python url_benchmark/train_offline_online.py run_group=EXP device=cuda agent=sf agent.feature_learner=hilp p_randomgoal=0.375 agent.hilp_expectile=0.5 agent.hilp_discount=0.98 agent.q_loss=True agent.mix_ratio=0 seed=0 task=quadruped_run expl_agent=rnd load_replay_buffer=PATH_TO_DATASET/datasets/quadruped/rnd/replay.pt replay_buffer_episodes=5000 num_grad_pretrain_steps=1000000 num_grad_finetune_steps=1000000 agent.use_rew_norm=True agent.preprocess=True experiment=hilp
# HILP on RND Jaco
PYTHONPATH=. python url_benchmark/train_offline_online.py run_group=EXP device=cuda agent=sf agent.feature_learner=hilp p_randomgoal=0.375 agent.hilp_expectile=0.5 agent.hilp_discount=0.98 agent.q_loss=True agent.mix_ratio=0 seed=0 task=jaco_reach_top_left expl_agent=rnd load_replay_buffer=PATH_TO_DATASET/datasets/jaco/rnd/replay.pt replay_buffer_episodes=20000 num_grad_pretrain_steps=1000000 num_grad_finetune_steps=1000000 agent.use_rew_norm=True agent.preprocess=True experiment=hilp

# unexpected keyword argument 'audio_path'
```
pip install -U imageio_ffmpeg
```

## License
MIT