# Diverse-Conventions
Exploring techniques to generate diverse conventions in multi-agent settings. The CoMeDi algorithm can be found in the XD directory. Unfortunately, we are unable to submit the trained models during the review process due to file size limitations. However, these will be available when the repository is made public.

## Installation
```
conda create --name DiverseConventions python=3.10
conda activate DiverseConventions
git submodule update --init --recursive
pip install -e .
git clone https://github.com/Stanford-ILIAD/PantheonRL
cd PantheonRL
pip install -e .
cd ..
```

## Tree Environment (Blind Bandits)
To train two conventions with CoMeDi:
```
python serial_trainer.py --num_env_steps 10000 --pop_size 2 --xp_weight 0.5 --mp_weight 0.0 --lr 2e-5 --critic_lr 2e-5 --env_name Tree --run_dir experiment
```

To train two conventions with ADAP:
```
python stat_trainer.py --num_env_steps 10000 --pop_size 2 --loss_type ADAP --loss_param 0.2 --lr 2e-5 --critic_lr 2e-5 --env_name Tree
```

There is also an interactive (user agent) program. You can choose between RAND (partner moves randomly), SAFE (partner strives for reward of 1), RISKY (partner strives for reward of 3), and LOAD (trained partner). 

To play against the safe agent, run this:

```
python tree_cli.py SAFE
```

Run this for loading the first convention (after training):

```
python tree_cli.py LOAD --partner-load ./Tree/results/experiment/1/convention0/models/actor.pt
```

To load the second convention (after training):
```
python tree_cli.py LOAD --partner-load ./Tree/results/experiment/1/convention1/models/actor.pt
```

In general, you can run
```
python tree_cli.py LOAD --partner-load ./Tree/results/experiment/[SEED#]/convention[CONVENTION#]/models/actor.pt
```

## Line Environment (Balance Beam)
To train the 2 CoMeDi conventions:
```
python serial_trainer.py --num_env_steps 125000 --pop_size 2 --xp_weight 0.15 --mp_weight 0.5 --lr 2.5e-4 --critic_lr 2.5e-4 --episode_length 1250 --use_linear_lr_decay --env_length 2 --env_name Line --seed 1
```

To train two conventions with ADAP:

```
python stat_trainer.py --env_name Line --num_env_steps 50000 --lr 2.5e-4 --critic_lr 2.5e-4 --loss_type ADAP --episode_length 1250 --loss_param 0.05 --pop_size 2 --env_length 3 --use_linear_lr_decay
```


There is also an interactive (user agent) program. You can choose between RAND (partner moves randomly), LEFT (partner is biased towards the left to break symmetries), RIGHT (partner is biased towards the right to break symmetries), and LOAD (trained partner). 

To play against the left agent, run this:

```
python numline_cli.py LEFT
```

Run this for loading the first convention (after training):

```
python numline_cli.py LOAD --partner-load ./Line/results/standard/1/convention0/models/actor.pt
```

To load the second convention (after training):
```
python numline_cli.py LOAD --partner-load ./Line/results/standard/1/convention1/models/actor.pt
```

In general, you can run
```
python numline_cli.py LOAD --partner-load ./Line/results/standard/[SEED#]/convention[CONVENTION#]/models/actor.pt
```

## Overcooked Environment

To train SP:

```
python serial_trainer.py --num_env_steps 400000 --pop_size 1 --xp_weight 0.5 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple_sp --restored 0
python serial_trainer.py --num_env_steps 400000 --pop_size 1 --xp_weight 0.5 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination_sp --restored 0
```

To train ADAP:

```
python stat_trainer.py --num_env_steps 400000 --pop_size 4 --loss_type ADAP --loss_param 0.025 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple0.025ADAP
python stat_trainer.py --num_env_steps 400000 --pop_size 4 --loss_type ADAP --loss_param 0.025 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination0.025ADAP

python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --loss_type ADAP --loss_param 0.025 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple0.025ADAP
python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --loss_type ADAP --loss_param 0.025 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination0.025ADAP
```

To train CoMeDi0 (XP):

```
python serial_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple_xp0.25 --restored 0
python serial_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination_xp0.25 --restored 0

python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple_xp0.25
python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 0.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination_xp0.25
```

To train CoMeDi1 (MP):

```
python serial_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 1.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple_xp0.25_mp1.0 --restored 0
python serial_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 1.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination_xp0.25_mp1.0 --restored 0

python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 1.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout simple --run_dir simple_xp0.25_mp1.0
python oracle_trainer.py --num_env_steps 400000 --pop_size 4 --xp_weight 0.25 --mp_weight 1.0 --lr 2.0e-4 --critic_lr 2.0e-4 --episode_length 4000 --env_length 200 --use_linear_lr_decay --entropy_coef 0.0 --env_name Overcooked --seed 1 --over_layout random1 --run_dir coordination_xp0.25_mp1.0
```

To try out the website (models already created), run this:
```
python overcooked_env/flask_app.py --modelpath ./overcooked_env/test_models/ --trajs_savepath ./overcooked_env/savepath
```

In general, you can switch out convention0 with whichever conventions you want to test out.
