# Adversarial Policy Transfer in Mixed Cooperative-Competitive Games

This repository contains the code for the transferable adversarial policy framework as well as other baseline methods, such as UPDeT, MATTER, DT2GS, SUB-PLAY, Ours. It also includes implementations of environments that support different Mixed Cooperative-Competitive GAMES, such as **SMAC**,  **MAgent** and **SMACv2**.

## Supported Algorithms

* Transferable multi-agent reinforcement learning algorithms:
  
  - UPDeT
  - MATTER
  - DT2GS
* adversarial policy in the multi-agent setting:
  * SUB-PLAY

* Transferable Adversarial policy:
  * Ours


## Supported Environments

* SMACDual, SMAC
* MAgentDual, MAgent
* SMACv2Dual, SMACv2

## How to run the code

### Install Dependencies

* Create Conda Environment

```bash
# This will create an Anaconda environment named amb.
conda env create -f amb.yml
```

* Install StarCraft II

Change to the directory where you want to install StarCraftII, then run following commands:

```bash
wget https://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip
unzip -P iagreetotheeula SC2.4.10.zip 
rm -rf SC2.4.10.zip

cd StarCraftII/
wget https://raw.githubusercontent.com/Blizzard/s2client-proto/master/stableid.json
```

Add following lines into `~/.bashrc`:

```bash
export SC2PATH="/path/to/your/StarCraftII"
```

Copy the `amb/envs/smac/SMAC_Maps` directory to `StarCraftII/Maps`.

* Install MAgent

Install using pip: `pip install magent2`. 

### Train the victims and Save the models

The code uses different parameters for different algorithms and environments. The default parameters are located in `./amb/configs/`. Algorithm parameters are stored in YAML files named `{algorithm}.yaml` within the `./amb/configs/algos_cfgs/` directory. Environment parameters are saved in YAML files named `{env}.yaml` within the `./amb/configs/envs_cfgs/` directory:

* `{algorithm}`:
  * MAPPO: mappo
  * QMIX: qmix
* `{env}`:
  * SMAC: smac
  * MAgent: magents
  * SMACv2: smacv2

To train the agents, use the following command that trains the victim of *3m* in SMAC for example:

```bash
python -u single_train.py --algo mappo --env smac --run single --exp_name 3m-victim --env.map_name 3m --seed 1
```

Then the victim policies are saved within the `results/smac/3m/single/mappo/3m-victim/seed-00001/models` directory.

### Train the transferable adversarial agents and Save the models 

To train the transferable agents for a referenced map *3m* in the particular period, use the following command for example:

```bash
# UPDeT
CUDA_VISIBLE_DEVICES=1 python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-updet-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# MATTER
python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-matter-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.env_belief_matter True --angel.env_prior_path ./matter_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/3m.npy --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# DT2GS
python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-dt2gs-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --angel.actor_use_dt2gs True --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# SUB-PLAY
python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-subplay-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --angel.actor_divide_conquer True --angel.actor_use_subplay True --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# Ours
python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/3m.npy --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual 2m_vs_1z_dual 6m_dual 3s_vs_4z_dual 3s5z_dual
```

* `--env`: The specified environment, `smac_dual` for SMAC and `magents_dual` for MAgent.
* `--env.map_name`: the specified map name through the parameter `--env.map_name`. If the string specified in `--env.map_name` is none, the map name in the specified config file of the environment is used instead. 
* `--exp_name`:  the specified experiment name.
* `--seed`: the specified seed.
* `--angel.num_env_steps`: the specified total timestep of this period.
* `--load_demon`: the specified experiment directory of the victim policy. The weights of victim policies are loaded from `{load_demon}/models/`
* `--angel.load_critic`, `--angel.actor_use_updet`: Set to `True` for all methods.
* `--angel.env_belief`: Set to `True` for `MATTER`, `Ours`, `False` for other methods.
* `--angel.env_prior_path`: the ground truth type of the scenario organized in the format of 1D `numpy.ndarray` with sum of 1, useful for `MATTER`, `Ours`.
* `--angel.actor_divide_conquer`: Set to `True` for `SUB-PLAY`, `Ours`, `False` for other methods.
* `--angel.env_belief_matter`: Set to `True` for `MATTER`, `False` for other methods.
* `--angel.actor_use_dt2gs`: Set to `True` for `DT2GS`, `False` for other methods.
* `--angel.actor_use_subplay`: Set to `True` for `SUB-PLAY`, `False` for other methods.
* `--env.multi_map_alignment`: Set to `True` if the set of training environment involves multi types of agents.
* `--multi_map_list`: The set of training environment, useful while `--env.multi_map_alignment` is set to `True`.
* `--angel.model_dir`: The checkpoint of the adversarial policy for the last training scenario.

The models are saved in the directories like:

```bash
models: ./results/{env}/{map}/mappo-{victim-algo}/{exp_name}/{seed}/models/
```

And we use the curricular for multi-task training. Then the whole training pipeline may be like:

```bash
python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map1 --angel mappo --run dual --env.map_name 3m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.episode_length 100 --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/3m.npy --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map2 --angel mappo --run dual --env.map_name 2s3z_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/2s3z_dual/dual/mappo-mappo/2s3z-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.episode_length 100 --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/2s3z.npy --angel.model_dir ./results/smac_dual/3m_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map1/seed-00001/models/angel --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map3 --angel mappo --run dual --env.map_name 3s_vs_3z_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3s_vs_3z_dual/dual/mappo-mappo/3s_vs_3z-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.episode_length 100 --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/3s_vs_3z.npy --angel.model_dir ./results/smac_dual/2s3z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map2/seed-00001/models/angel --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual 2m_vs_1z_dual

python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map4 --angel mappo --run dual --env.map_name 8m_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/8m_dual/dual/mappo-mappo/8m-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.episode_length 100 --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/8m.npy --angel.model_dir ./results/smac_dual/3s_vs_3z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map3/seed-00001/models/angel --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

python -u dual_train.py --env smac_dual --exp_name 3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map5 --angel mappo --run dual --env.map_name 3s_vs_5z_dual --angel.num_env_steps 1000000 --load_demon ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3s_vs_5z-victim/seed-00001 --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.episode_length 100 --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/3s_vs_5z.npy --angel.model_dir ./results/smac_dual/8m_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map4/seed-00001/models/angel --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual
```

### Zero-shot Attack in the previously unseen scenario

To attack the models, use the following command for example (attack the *2m_vs_1z* map):

```bash
# UPDeT
python -u dual_train.py --env smac_dual --exp_name test --angel mappo --run dual --env.map_name 2m_vs_1z_dual --load_demon ./results/smac_dual/2m_vs_1z_dual/dual/mappo-mappo/2m_vs_1z-victim/seed-00001 --angel.model_dir ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-updet-epoch1-map5/seed-00001/models/angel --angel.eval_only True --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# MATTER
python -u dual_train.py --env smac_dual --exp_name test --angel mappo --run dual --env.map_name 2m_vs_1z_dual --load_demon ./results/smac_dual/2m_vs_1z_dual/dual/mappo-mappo/2m_vs_1z-victim/seed-00001 --angel.model_dir ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-matter-epoch1-map5/seed-00001/models/angel --angel.eval_only True --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.env_belief_matter True --angel.env_prior_path ./matter_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/test.npy --angel.matter_transfer_test True --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# DT2GS
python -u dual_train.py --env smac_dual --exp_name test --angel mappo --run dual --env.map_name 2m_vs_1z_dual --load_demon ./results/smac_dual/2m_vs_1z_dual/dual/mappo-mappo/2m_vs_1z-victim/seed-00001 --angel.model_dir ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-dt2gs-epoch1-map5/seed-00001/models/angel --angel.eval_only True --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --angel.actor_use_dt2gs True --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# SUB-PLAY
python -u dual_train.py --env smac_dual --exp_name test --angel mappo --run dual --env.map_name 2m_vs_1z_dual --load_demon ./results/smac_dual/2m_vs_1z_dual/dual/mappo-mappo/2m_vs_1z-victim/seed-00001 --angel.model_dir ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-subplay-epoch1-map5/seed-00001/models/angel --angel.eval_only True --angel.load_critic False --angel.actor_use_updet True --angel.env_belief False --angel.actor_divide_conquer True --angel.actor_use_subplay True --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual

# Ours
python -u dual_train.py --env smac_dual --exp_name test --angel mappo --run dual --env.map_name 2m_vs_1z_dual --load_demon ./results/smac_dual/2m_vs_1z_dual/dual/mappo-mappo/2m_vs_1z-victim/seed-00001 --angel.model_dir ./results/smac_dual/3s_vs_5z_dual/dual/mappo-mappo/3m-2s3z-3s_vs_3z-8m-3s_vs_5z-llm-epoch1-map5/seed-00001/models/angel --angel.eval_only True --angel.load_critic False --angel.actor_use_updet True --angel.env_belief True --angel.actor_divide_conquer True --angel.env_prior_path ./llm_env_prior/3m-2s3z-3s_vs_3z-8m-3s_vs_5z/uniform-prior.npy --env.multi_map_alignment True --multi_map_list 3m_dual 2s3z_dual 3s_vs_3z_dual 8m_dual 3s_vs_5z_dual
```

* `--angel.matter_transfer_test`: Searching for the optimal task embedding. Set to `True` for `MATTER`, `False` for other methods.

## Demo Videos

We record the behaviors of the agents under the attack in the videos. These videos showcase our methods alongside the baseline methods in the transfer paradigm of training on *3m*, *8m*, testing on other marine series maps of SMAC. We visualize the performances on seen task `8m` and unseen task `11m` in the `./videos` directory.