This repository contains the code to run KPG, an algorithm described in the paper 'K-Level Policy Gradients for Multi-Agent Reinforcement Learning'. This code can run KMAPPO, KFACMAC, KMADDPG, and all the baselines in the paper. The JAX and Torch experiments use separate virtual environments for ease of use.

# Installation

To install the required dependencies for the torch experiments, you can install conda and use the following commands in the folder torch_experiments:

```bash
conda env create --name kpg_torch --file environment_torch.yml python=3.11.7
conda activate kpg_torch
pip install k_level_policy_gradients
```

To install the required dependencies for the jax experiments, you can install conda and use the following commands in the folder torch_experiments:

```bash
conda env create --name kpg_jax --file environment_jax.yml python=3.11.7
conda activate kpg_jax
pip install kpg_jax
```

# Usage

Log in with wandb to log the runs live.

__Torch__: Experiments are run using the top-level script ```launch_experiments.py```. This part of the repository is based on MushroomRL[1] and extends their code for use in multi-agent environments. This script allows you to control the base environment (MAMuJoCo or SMAC), the specific scenario within that environment, the agent, the hyperparameters of the agent and the environment, and the number of parallel seeds.

```launch_experiments.py``` is setup by default to run 1 seed of K2-FACMAC on MMM in SMAC. The default settings used in the paper for the environments and agents can be seen in k_level_policy_gradients/src/configs/*.yaml.

__JAX__: Each algorithm is defined in one folder with a python script and config file. Run the python file with python to use the provided config. Use hydra overrides to specify desired changes. E.g. to run 5 seeds of K2-MAPPO on 3s_vs_5z with the settings in the paper, go to kpg/jax_experiments/kpg_jax/src/jax_experiments/smax/ppo/kmappo/ik2mappo_inner and run:

```bash
conda activate kpg_jax
python ik2m_inner.py MAP_NAME=3s_vs_5z
```

Note that these experiments are designed to work purely with Nvidia GPUs, i.e. they will not work with other GPUs or CPUs. Based on JaxMARL[2]

[1] D'Eramo, Carlo, et al. "Mushroomrl: Simplifying reinforcement learning research." Journal of Machine Learning Research 22.131 (2021): 1-5.
[2] Rutherford, Alexander, et al. "Jaxmarl: Multi-agent rl environments in jax." arXiv preprint arXiv:2311.10090 (2023).