## The Code for a paper titled: Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

### Experiment 1: m-D environment

First generate the parameters for training the the contextual policies
`python generate_mdenv_cari_params.py`

Then launch the contextual policy training (using slurm and parallel computation) with
`sbatch mdenv_cari_launcher.job`

The file `mdenv_cari_launcher.job` uses `cari_mdenv.py` for launching the training.

A script `generate_scalability_params.py` is run for generating the experiment parameters for the IRL part.
Then launch the IRL part with
`sbatch experiment_scalability.job`
that parallelises the computation of expert trajectories and the IRL part for varying number of sub-reward functions. `mdenv_generate_trajectories.py` is used for generating the trajectories and the actual IRL part happens in `mdenv_irl.py`.

For sample complexity experiment run 
`sbatch experiment_airl.sh`
to launch the experiment. It uses `mdenv_airl.py` script for the inference and `mdenv_generate_trajectories_airl.py`for simulation of the trajectories (repetitive because of different branch in version control).

The environment is defined in `mdtoyenv.py`

### Experiment 2: Derk population experiment
Train a contextual policy with
`python train_cari_derk.py`

Generate expert trajectories with 
`python derk_generate_trajectories.py`

Run IRL on Derk with
`python derk_irl.py`


### Common:
`PPO.py` PPO algorithm, modified nikhilbarhate99 @ github
`reward_fns.py` reward functions and Derk's RewardCalculator class
`util.py`for random utilities that are needed across varying tasks, or then just put here for storage

