# IN-RIL

This codebase includes the implementation of IN-RIL with graident surgery, and IN-RIL integration with state-of-the-art RL fine-tuning methods, DPPO, and IDQL. We will show the environment setup, and the experiment instructions. All the codes and configs are included.

The codebase is built upon [DPPO](https://github.com/irom-princeton/dppo). We include their license.

## Installation 

Install core dependencies with a conda environment (if you do not plan to use Furniture-Bench, a higher Python version such as 3.10 can be installed instead) on a Linux machine with a Nvidia GPU.
```console
conda create -n dppo python=3.8 -y
conda activate dppo
pip install -e .
```

Install specific environment dependencies (Gym / Kitchen / Robomimic / Furniture-Bench) or all dependencies (except for Kitchen, which has dependency conflicts with other tasks).
```console
pip install -e .[gym] # or [kitchen], [robomimic], [furniture]
pip install -e .[all] # except for Kitchen
```

[Install MuJoCo for Gym and/or Robomimic](installation/install_mujoco.md). [Install IsaacGym and Furniture-Bench](installation/install_furniture.md)

Set environment variables for data and logging directory (default is `data/` and `log/`), and set WandB entity (username or team name)

```
source script/set_path.sh
```

## Usage - Pre-training

**Note**: You may skip pre-training if you would like to use the default checkpoint (available for download) for fine-tuning.

<!-- ### Prepare pre-training data

First create a directory as the parent directory of the pre-training data and set the environment variable for it.
```console
export DPPO_DATA_DIR=/path/to/data -->
<!-- ``` -->

Pre-training data for all tasks are pre-processed and can be found at [here](https://drive.google.com/drive/folders/1AXZvNQEKOrp0_jk1VLepKh_oHCg_9e3r?usp=drive_link). Pre-training script will download the data (including normalization statistics) automatically to the data directory.
<!-- The data path follows `${DPPO_DATA_DIR}/<benchmark>/<task>/train.npz`, e.g., `${DPPO_DATA_DIR}/gym/hopper-medium-v2/train.npz`. -->

### Run pre-training with data

All the configs can be found under `cfg/<env>/pretrain/`. A new WandB project may be created based on `wandb.project` in the config file; set `wandb=null` in the command line to test without WandB logging.
<!-- To run pre-training, first set your WandB entity (username or team name) and the parent directory for logging as environment variables. -->
<!-- ```console
export DPPO_WANDB_ENTITY=<your_wandb_entity>
export DPPO_LOG_DIR=<your_prefered_logging_directory>
``` -->
```console
# Gym - hopper/walker2d/halfcheetah
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/gym/pretrain/hopper-medium-v2
# Robomimic - lift/can/square/transport
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/robomimic/pretrain/can
python script/run.py --config-name=pre_diffusion_mlp_ta1_ph \
    --config-dir=cfg/robomimic/pretrain/square
python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/robomimic/pretrain/transport
```

## Usage - Fine-tuning

<!-- ### Set up pre-trained policy -->

<!-- If you did not set the environment variables for pre-training, we need to set them here for fine-tuning. 
```console
export DPPO_WANDB_ENTITY=<your_wandb_entity>
export DPPO_LOG_DIR=<your_prefered_logging_directory>
``` -->
<!-- First create a directory as the parent directory of the downloaded checkpoints and set the environment variable for it.
```console
export DPPO_LOG_DIR=/path/to/checkpoint
``` -->

Pre-trained policies used in the paper can be found [here](https://drive.google.com/drive/folders/1ZlFqmhxC4S8Xh1pzZ-fXYzS5-P8sfpiP?usp=drive_link). Fine-tuning script will download the default checkpoint automatically to the logging directory.
 <!-- or you may manually download other ones (different epochs) or use your own pre-trained policy if you like. -->

 <!-- e.g., `${DPPO_LOG_DIR}/gym-pretrain/hopper-medium-v2_pre_diffusion_mlp_ta4_td20/2024-08-26_22-31-03_42/checkpoint/state_0.pt`. -->

<!-- The checkpoint path follows `${DPPO_LOG_DIR}/<benchmark>/<task>/.../<run>/checkpoint/state_<epoch>.pt`. -->

### Fine-tuning pre-trained policy with RL

All the configs can be found under `cfg/<env>/finetune/`. A new WandB project may be created based on `wandb.project` in the config file; set `wandb=null` in the command line to test without WandB logging.
<!-- Running them will download the default pre-trained policy. -->
<!-- Running the script will download the default pre-trained policy checkpoint specified in the config (`base_policy_path`) automatically, as well as the normalization statistics, to `DPPO_LOG_DIR`.  -->

Gym

```console
# Gym - hopper/walker2d/halfcheetah
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/finetune/hopper-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/finetune/halfcheetah-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/finetune/walker2d-v2
```

```
# Robomimic - lift/can/square/transport
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/can
```

### Fine-tuning pre-trained policy with RI

Gym + RI:

```console
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/ri/hopper-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/ri/halfcheetah-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/gym/ri/walker2d-v2

# Residual
python script/run.py --config-name=ft_residual_diffusion_mlp \
    --config-dir=cfg/gym/ri/hopper-v2
```

Robomimic + RI

```console
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/ri/can
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/ri/transport
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/ri/lift
python script/run.py --config-name=ft_ppo_diffusion_mlp \
    --config-dir=cfg/robomimic/ri/square
```

Image-based IDQL:
```console
# DPPO RL
python script/run.py --config-name=ft_ppo_diffusion_mlp_img \
    --config-dir=cfg/robomimic/finetune/transport
# IDQL RL
python script/run.py --config-name=ft_idql_diffusion_mlp_img \
    --config-dir=cfg/robomimic/finetune/transport

python script/run.py --config-name=ft_ppo_diffusion_mlp_img_grad \
    --config-dir=cfg/robomimic/ri/transport
```

### Fine-tuning pre-trained policy with RI + UPGrad

Gym + RI:

```console
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/hopper-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/halfcheetah-v2
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/walker2d-v2
```


Robomimic:

```console
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/robomimic/ri/can
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/robomimic/ri/lift
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/robomimic/ri/square
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad \
    --config-dir=cfg/robomimic/ri/transport
```

Robomimic PH:

```console
# DPPO
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad_ph \
    --config-dir=cfg/robomimic/ri/square
python script/run.py --config-name=ft_ppo_diffusion_mlp_grad_ph \
    --config-dir=cfg/robomimic/ri/can
# IDQL
python script/run.py --config-name=ft_idql_diffusion_mlp_grad_ph \
    --config-dir=cfg/robomimic/ri/square
python script/run.py --config-name=ft_idql_diffusion_mlp_grad_ph \
    --config-dir=cfg/robomimic/ri/can
```

### IDQL RI

Gym + RI + UPGrad:

```console
python script/run.py --config-name=ft_idql_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/hopper-v2
python script/run.py --config-name=ft_idql_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/halfcheetah-v2
python script/run.py --config-name=ft_idql_diffusion_mlp_grad \
    --config-dir=cfg/gym/ri/walker2d-v2
```

### Other Baselines

Robomimic

```console
python script/run.py --config-name=ft_<algo_name>_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/transport device=cuda:0
python script/run.py --config-name=ft_<algo_name>_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/can device=cuda:1
python script/run.py --config-name=ft_<algo_name>_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/lift device=cuda:2
python script/run.py --config-name=ft_<algo_name>_diffusion_mlp \
    --config-dir=cfg/robomimic/finetune/square device=cuda:3
```
