# A Ranking Game for Imitation Learning (anonymized submission for NeurIPS 2022)

## Conda Environment
```
conda env create -f environment.yml
```
Install D4RL from source here: https://github.com/rail-berkeley/d4rl (In setup.py comment out dm_control)

## Download expert data 
```
Download and extract expert data and preferences data inside the rank-game/samples/ folder from this anonymized [link](https://drive.google.com/drive/folders/1KJayG61KqiHqtRbxnUrGPBxTX2oDccSn?usp=sharing)
```


## Algorithm
- RANK-GAME methods (auto and pref): `samples/irl.py` (ours)
- Preference learning offline: `samples/irl_offline.py` (ours)

## Running Code
Before running experiments, excecute the following commands:
```
cd rank_game
export PYTHONPATH=${PWD}:$PYTHONPATH
```

For running the experiments: 
```   
[Imitation Learning from expert] python samples/irl.py <config_name>
[Imitation Learning from expert+perferences] python samples/irl.py <config_name> # set obj to pal-preferences-weighted in config
[IL from suboptimal preferences] python samples/irl_offline.py --config=<config_name> --exp_name=<exp_name>
```

config_name example: "samples/configs/agents/walker2d.yml". Configs for different environments can be found in "samples/configs/agents/".

## Setting up the config

Key parameters to configure through the environments config:
obj: <ranking loss used for setting up the rank game> {rank-pal,rank-ral,pal-preferences-weighted}
irl:epochs: <Number of reward training epochs> {int}
irl:regularization: <Reward shaping parameterization> {exp-1,linear}
sac:epochs: <Number of policy training epochs> {int}
seed: <Seed for the training run> {int}
exp_name: <relative location to store the training logs>

IL from suboptimal preferences alone: First trains a reward function and then trains a policy on the learned reward function.
Only one objective is compatible with irl_offline.py: ['snippet-validation-preferences-weighted']

IL from expert data and possibly suboptimal preferences: Trains reward and policy alternatively by online interactions with the environment.
Objectives compatible with irl.py: ['rank-pal','rank-ral','pal-preferences-weighted']

Following objectives correspond to the following tasks: 
```
rank-pal : RANK-PAL
rank-ral : RANK-RAL
pal-preferences-weighted: RANK-PAL + preferences (obtained from D4RL or stored datasets)
snippet-validation-preferences-weighted: preferences only [TREX/ max-margin can be configured through the code in file f_div.py]
```