# TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

## Table of Contents
1. [Installation](#Installation)
2. [Usage](#usage)

## Installation
First follow the instruction in the folder `transic-envs` to create a virtual environment, install IsaacGym, and install our simulation codebase `transic-envs`.

Now install this codebase.
```bash
pip3 install -e .
```

## Usage
### Training Teacher Policies
The basic syntax to launch teacher policy RL training is
```bash
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
  sim_device=cuda:<gpu_id> rl_device=cuda:<gpu_id> graphics_device_id=<gpu_id>
```
You need to replace anything within `<>` with suitable values.

The training command will create a folder called `runs/{experiment_name}` under the current directory, where you can find the training config and saved checkpoints.

To test a checkpoint, run the following command.
```bash
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
  test=true checkpoint=<path_to_your_checkpoint>
```

### Training Student Policies
#### Prepare the Training Data
We use trained teacher policies to generate data for student policies. To do so, simply run the following command.
```bash
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
  test=true checkpoint=<path_to_your_checkpoint> \
  save_rollouts=true
```
Rollouts will be saved in `runs/{experiment_name}/rollouts.hdf5`.

#### Start Training
The basic syntax to launch student policy distillation is
```bash
python3 main/distillation/train.py task=<task_name> distillation_student_arch=<arch> \
  bs=<batch_size> num_envs=<num_of_parallel_envs> exp_root_dir=<where_to_log_experiment> \
  data_path=<path_to_hdf5_file> matched_scene_data_path=<path_to_matched_scene_data> \
  sim_device=cuda:<gpu_id> rl_device=cuda:<gpu_id> graphics_device_id=<gpu_id> gpus=\[<gpus>\] \
  wandb_project=<your_wandb_project_name>
```
Similarly, you need to replace anything within `<>` with suitable values. You can select either `pointnet` or `rnn_pointnet` for policy architecture. You may need to tune the batch size `bs` and number of parallel environments `num_envs` to fit into your GPU. The `exp_root_dir` specifies where you would like to log the experiment. The `data_path` is where your generated rollouts are saved. 

The experiment will be logged at `exp_root_dir`, where you can find the saved config, logs, tensorboard, and checkpoints. Since we periodically switch between training and simulation evaluation. Policies are saved based on their success rates.

To test and visualize trained student policies, run the following command.
```bash
python3 main/distillation/test.py task=<task_name> distillation_student_arch=<arch> \
  bs=null num_envs=<num_of_parallel_envs> exp_root_dir=<where_to_log_experiment> \
  data_path=null matched_scene_data_path=null \
  test.ckpt_path=<path_to_student_policy> display=true
```

### Correction Data Collection
Once we have the simulation base policy, we deploy it on a real robot while a human operator monitors its execution. The human operator intervenes the policy execution when necessary and provides correction through teleoperation. To collect such correction data, checkout the script
```bash
python3 main/correction_data_collection.py \
  --base-policy-ckpt-path <path_to_simulation_base_policy_ckpt> \
  --data-save-path <where_to_save_correction_data>
```
We notice that the real-world observation pipeline and real robot controller may differ across different groups. Therefore, you have to fill in the instantiation of these two components in the script.

### Training Residual Policies
Once we have enough correction data, we can train residual policies with two steps. First, we only learn the residual action head.
```bash
python3 main/residual/train.py residual_policy_arch=<arch> \
  data_dir=<correction_data_path> exp_root_dir=<where_to_log_experiment> \
  residual_policy_task=<task> \
  gpus=<gpus> bs=<batch_size> \
  module.intervention_pred_loss_weight=0.0 \
  wandb_project=<your_wandb_project_name>
```
For `residual_policy_task`, use `insert` for the task Insert and `default` for others.

We then freeze everything and only learn the head to predict intervention or not.
```bash
python3 main/residual/train.py residual_policy_arch=<arch> \
  data_dir=<correction_data_path> exp_root_dir=<where_to_log_experiment> \
  residual_policy_task=<task> \
  gpus=<gpus> bs=<batch_size> \
  module.residual_policy.update_intervention_head_only=True \
  module.residual_policy.ckpt_path_if_update_intervention_head_only=<path_to_ckpt_from_the_first_step>
  wandb_project=<your_wandb_project_name>
```

### Integrated Deployment
Once we have both the simulation base policy and the residual policy, we can integrate them together for successful sim-to-real transfer. Checkout the script
```bash
python3 main/integrated_deployment.py \
  --base-policy-ckpt-path <path_to_simulation_base_policy_ckpt> \
  --residual-policy-ckpt-path <path_to_residual_policy_ckpt>
```
Similarly, you need to fill in the instantiation for real-world observation pipeline and the real-robot controller.
