This repository contains the implementation of the foundation model guided skill discovery method, FoG.
The implementation is based on
[METRA](https://github.com/seohongpark/METRA). If you are interested in changes we made based on the METRA codebase, please see:
- Foundation models are defined in ```fm.py```.
- The way to use foundation models is defined in ```iod/skiller.py```, i.e. the score function.
- Add the logic for using foundation models to score states in ```iod/metra.py```.

Please be noted that there are some unused flags, and we will further clean the codebase for the camera-ready version.

## Requirements
- Python 3.8

## Installation

```
conda create --name metra python=3.8
conda activate metra
pip install -r requirements.txt --no-deps
pip install -e .
pip install -e garaged
```

## Examples

For all state-based experiments, please make sure that you comment out the wrong score function and use the correct one for your task in ```GPTOutputSkiller()``` in ```iod/skiller.py```. For all pixel-based experiments, please make sure that you change the textual intentions to the one you would like to use in ```CLIP()``` in ```fm.py```.

```
# Experiments in Section 5.1
# FoG on state-based HalfCheetah (to not flip)
python tests/main.py --run_group halfcheetah_FoG --env half_cheetah --max_path_length 200 --seed 0 --traj_batch_size 8 --n_parallel 1 --normalizer_type preset --trans_optimization_epochs 50 --n_epochs_per_log 100 --n_epochs_per_eval 1000 --n_epochs_per_save 10000 --sac_max_buffer_size 1000000 --algo metra --discrete 0 --dim_option 2 --skill_reward

# Experiments in Section 5.1
# FoG on state-based Ant (to not go south)
python tests/main.py --run_group ant_FoG --env ant --max_path_length 200 --seed 0 --traj_batch_size 8 --n_parallel 1 --normalizer_type preset --eval_plot_axis -50 50 -50 50 --trans_optimization_epochs 50 --n_epochs_per_log 100 --n_epochs_per_eval 1000 --n_epochs_per_save 10000 --sac_max_buffer_size 1000000 --algo metra --discrete 0 --dim_option 2 --skill_reward

# Experiments in Section 5.2.1
# FoG on pixel-based Cheetah (to not flip).

# Experiments in Section 5.2.2
# FoG on pixel-based DMC Cheetah (to avoid hazardous areas)
python tests/main.py --run_group Cheetah_FoG --env dmc_cheetah --max_path_length 200 --seed 0 --traj_batch_size 8 --n_parallel 4 --normalizer_type off --video_skip_frames 2 --frame_stack 3 --sac_max_buffer_size 300000 --eval_plot_axis -15 15 -15 15 --algo metra --trans_optimization_epochs 200 --n_epochs_per_log 25 --n_epochs_per_eval 125 --n_epochs_per_save 1000 --n_epochs_per_pt_save 1000 --discrete 0 --dim_option 4 --encoder 1 --sample_cpu 0 --skill_reward_op 'mul' --skill_reward_type 'prob' --skill_reward_coef 1.0 --skill_reward

# Experiments in Section 5.2.2
# FoG on pixel-based Quadruped (to avoid hazardous areas)
python tests/main.py --run_group Quadruped_FoG --env dmc_quadruped --max_path_length 200 --seed 0 --traj_batch_size 8 --n_parallel 4 --normalizer_type off --video_skip_frames 2 --frame_stack 3 --sac_max_buffer_size 300000 --eval_plot_axis -15 15 -15 15 --algo metra --trans_optimization_epochs 200 --n_epochs_per_log 25 --n_epochs_per_eval 125 --n_epochs_per_save 1000 --n_epochs_per_pt_save 1000 --discrete 0 --dim_option 4 --encoder 1 --sample_cpu 0 --skill_reward_op 'mul' --skill_reward_type 'onehot' --skill_reward_coef 1.0 --skill_reward

# Experiments in Section 5.2.3
# FoG on pixel-based Humanoid (to twist)
python tests/main.py --run_group humanoid_FoG_twisted_0.1 --env dmc_humanoid --max_path_length 200 --seed 0 --traj_batch_size 8 --n_parallel 4 --normalizer_type off --video_skip_frames 2 --frame_stack 3 --sac_max_buffer_size 300000 --eval_plot_axis -15 15 -15 15 --algo metra --trans_optimization_epochs 200 --n_epochs_per_log 25 --n_epochs_per_eval 125 --n_epochs_per_save 1000 --n_epochs_per_pt_save 1000 --discrete 0 --dim_option 2 --encoder 1 --sample_cpu 0 --skill_reward_op 'mul' --skill_reward_type 'onehot' --skill_reward_coef 1.0 --skill_reward

# Experiments in A.3
# Downstream tasks on pixel-based Cheetah (Please set --cp_path to the path where your pre-trained reward model is.)
python tests/main.py --run_group Downstream_Cheetah --env dmc_cheetah_goal --max_path_length 4 --dim_option 2 --num_random_trajectories 48 --seed 0 --normalizer_type off --use_gpu 1 --traj_batch_size 8 --n_parallel 4 --algo sac --n_epochs_per_eval 25 --n_thread 1 --model_master_dim 1024 --n_epochs_per_log 10 --n_epochs_per_save 0 --n_epochs_per_pt_save 0 --n_epochs_per_pkl_update 0 --eval_record_video 1 --n_epochs 200001 --discrete 0 --sac_discount 0.99 --video_skip_frames 2 --frame_stack 3 --trans_optimization_epochs 50 --sac_max_buffer_size 300000 --eval_plot_axis -15 15 -15 15 --common_lr 0.0001 --trans_minibatch_size 256  --encoder 1 --sample_cpu 0 --goal_range 10 --cp_multi_step 50 --downstream_reward_type esparse --downstream_num_goal_steps 200 --cp_path your_path/option_policy2000.pt --cp_path_idx 0 --cp_unit_length 1 --alpha 0.1

```
