# Trust Region Inverse Reinforcement Learning (TRIRL)

- The `trirl_discrete` directory contains the discrete, grid-world implementation of our method. This can be used to generate Figure 2 in the paper. 
- The `trirl_continuous` directory contains the continuous domain implementation, for Mujoco and robotics experiments. 
- Installation and execution instructions below.


## Installation

*Note*: We assume that you start installation from the root directory which contains both `trirl_continuous` and `trirl_discrete` directories. We use wandb to log experiments.

Install RL-X (public reinforcement learning library - not associated with this submission)
```bash
conda create -n trirl python=3.11.4
conda activate trirl
git clone https://github.com/nico-bohlinger/RL-X.git && cd RL-X && git checkout 852a555 # clone a specific commit to ensure reproducibility
pip install -e .[all] --config-settings editable_mode=compat
pip uninstall $(pip freeze | grep -i '\-cu12' | cut -d '=' -f 1) -y
pip install -U "jax[cuda12]"
```

Install this codebase
```bash
cd ../trirl_continuous
pip install -e .
```

Install LocoMujoco (public imitation learning library - not associated with this submission)
```bash
cd ..
git clone https://github.com/robfiras/loco-mujoco.git && cd loco-mujoco && git checkout 131c1e7 # clone a specific commit to ensure reproducibility
pip install -e . 
```

Extract provided datasets
```bash
cd ../trirl_continuous
tar -xJf expert_data.tar.xz
```

## Experiments

### Grid-World
```bash
cd trirl_discrete
python experiment.py # generates .pdf plots
```

### Mujoco & Robotics
```bash
cd trirl_continuous/experiments
./run_experiment.sh
```

Algorithm names and default configurations of parameters can be found inside `trirl_continuous/irl_baselines/algorithms/<algo>`. For computationally expensive tasks like the Mujoco humanoid, Go2, and G1, we leverage full jitting to completely offload both the algorithm and environment on the GPU. These versions are called `<name>_flax_full_jit` and `<name>_flax_loco_mjx`. Hyperparameters are mentioned in the paper.  