# Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning
This paper concerns imitation learning (IL) (i.e, the problem of learning to mimic expert behaviors from demonstrations) in cooperative multi-agent systems. The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. 

In a single-agent setting, IL has proven to be done efficiently through an inverse soft-Q learning process given expert demonstrations. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning. This raises a fundamental question: how can we effectively recover both local and centralized Q functions for imitation learning in multi-agent scenarios?

In this work, we introduce a novel multi-agent IL algorithm designed to address these challenges.

# Code Structure

    ├── algorithms          # implemented algorithms on multi-agent environments
    │   ├── base.py         
    │   ├── bc.py           # Behavior Cloning
    │   ├── gail.py         # Multi-agent Generative Adversarial Imitation Learning
    │   ├── iiq.py          # Independent Inverse soft-Q Learning
    │   ├── iqvdn.py        # IQ-Learn for Value Decomposition Network
    │   ├── mifq.py         # Multi-agent Inverse Factorized Q-Learning
    │   ├── qmix.py         # Monotonic Value Function Factorisation
    │   └── sqil.py         # Multi-agent Imitation Learning via RL
    ├── network
    │   ├── mixer.py        # Hyper network
    │   ├── net.py          # Network layers
    │   └── utils.py
    └── trainer
        ├── base.py
        ├── buffer.py       # Replay buffer
        ├── grf_env.py
        ├── miner_env.py
        ├── mpe_env.py
        ├── runner.py       # Model runner
        ├── trainer.py      # Agent trainer
        └── utils.py

# Installation
- SMACv2: Check this repo https://github.com/oxwhirl/smacv2/, and follow their instruction to install StarCraft-II and SMACv2 environment. Currently, it doesn't support Windows.
- Gold Miner: Check this repo https://github.com/xphongvn/rlcomp2020 for more details. Basically, just `cd miner/heuristic` and `pip install .` to install this environment.
- MPE: Check this URL https://pettingzoo.farama.org/environments/mpe/ to install all MPE scenarios.
- Install `plotly`, `tqdm`, `setproctitle`, etc. for visualization.

# Expert Polices and Demonstrations
- We're contributing all expert policies into this repo. Please take a look at the folder `./expert_policies`.
- In case you are not able to find expert models, check this link to download all of them:
https://drive.google.com/file/d/1lfbfJ8k-gM76kcEGGKN1VzLWVFqrKqVi/view?usp=sharing
- All experts are well-trained by our computation resources. We compress all model parameters and their code into Just-In-Time (JIT) techniques by Pytorch. Just only using `torch.jit.load(model_path)` to load our expert policies.
- More details, you can check the `train_expert` option of `main.py`
- We also contribute the script to generate expert demonstrations. Check the file `trainer/runner.py` to see how we load expert JIT models, evaluate and collect expert demonstrations.
- Note that: You need more space of your drive to store all expert demonstrations. Contact us if you want our saved expert data.

# Train Scripts
- Run `python -u main.py --task <task-name>` for training the task *<task-name>*
- For examples, for training all SMACv2 tasks, run the following scripts:
    - `python -u main.py --task protoss_5_vs_5`
    - `python -u main.py --task protoss_10_vs_10`
    - `python -u main.py --task terran_5_vs_5`
    - `python -u main.py --task terran_10_vs_10`
    - `python -u main.py --task zerg_5_vs_5`
    - `python -u main.py --task zerg_10_vs_10`
- To train Gold Miner tasks:
    - `python -u main.py --task miner_easy_2_vs_2`
    - `python -u main.py --task miner_medium_2_vs_2`
    - `python -u main.py --task miner_hard_2_vs_2`
- To train MPE tasks:
    - `python -u main.py --task simple_speaker_listener`
    - `python -u main.py --task simple_spread`
    - `python -u main.py --task simple_reference`
- All experiments will be logged into `logs` folder during training.

# Evaluation and Visualization
- Check the notebook `benchmark.ipynb` for evaluation and visualization
- We're also sharing *saved results* (compressed as a pickle file) for reproductivity

# Contact us
- Will be revealed later