# Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

## 1. Model Architecture of HyPerNetwork (HPN)

![Agent permutation invariant network with hypernetworks](./doc/figure/API-HPN.png)

HPN incorporates [hypernetworks](https://arxiv.org/pdf/1609.09106) to generate different
weights ![](http://latex.codecogs.com/svg.latex?W_i)s for different input
components ![](http://latex.codecogs.com/svg.latex?x_i)s to improve representational capacity while ensuring the
same ![](http://latex.codecogs.com/svg.latex?x_i) always be assigned with the same
weight ![](http://latex.codecogs.com/svg.latex?W_i). The architecture of our HPN is shown in the above Figure (b). We
also take the ![](http://latex.codecogs.com/svg.latex?Q_i(o_i)) as an example. The model mainly composes of two modules:

**Permutation Invariant Input Layer.**  [hypernetworks](https://arxiv.org/pdf/1609.09106) are a family of neural
architectures which use one network, known as hypernetwork, to generate the weights for another network. In our setting,
the hypernetwork is utilized to generate a different ![](http://latex.codecogs.com/svg.latex?W_i) for
each ![](http://latex.codecogs.com/svg.latex?x_i) of the input set ![](http://latex.codecogs.com/svg.latex?X_j). As
shown in above Figure (b), ![](http://latex.codecogs.com/svg.latex?X_j) (which can be viewed as a batch
of ![](http://latex.codecogs.com/svg.latex?m) ![](http://latex.codecogs.com/svg.latex?x_i)s each of which is of
dimension ![](http://latex.codecogs.com/svg.latex?k), represented by different shades of blue) is firstly fed into a
shared hypernetwork (marked in yellow), whose input size is ![](http://latex.codecogs.com/svg.latex?k) and output size
is ![](http://latex.codecogs.com/svg.latex?k*h). Then, the corresponding outputs are reshaped
to ![](http://latex.codecogs.com/svg.latex?\[k,h\]) and serve as the submodule
weights ![](http://latex.codecogs.com/svg.latex?W_i)s of the normal FC layer (see Figure (a)). Note that
different ![](http://latex.codecogs.com/svg.latex?x_i)s will generate
different ![](http://latex.codecogs.com/svg.latex?W_i)s and the same ![](http://latex.codecogs.com/svg.latex?x_i) will
always correspond to the same ![](http://latex.codecogs.com/svg.latex?W_i). Then,
each ![](http://latex.codecogs.com/svg.latex?x_i) is multiplied by ![](http://latex.codecogs.com/svg.latex?W_i) and all
multiplication results and the bias ![](http://latex.codecogs.com/svg.latex?b) are summed together to get the output.
Since each element ![](http://latex.codecogs.com/svg.latex?x_i) is processed separately by its
corresponding ![](http://latex.codecogs.com/svg.latex?W_i) and then merged by a permutation invariant 'sum' function,
the permutation invariance is reserved.

**Permutation Equivariance Output Layer.** Similarly, to keep the whole network permutation equivariance, the submodular
weights and bias of the agent-related actions in the output layer,
e.g., ![](http://latex.codecogs.com/svg.latex?\mathcal{A}_i^\text{attack}) of SMAC, are also generated by a
hypernetwork. As mentioned above, the input ![](http://latex.codecogs.com/svg.latex?x_i) and
output ![](http://latex.codecogs.com/svg.latex?W_i) of the hypernetwork always correspond one-to-one, so the input order
change will result in the same output order change, thus achieving permutation equivariance.

We emphasize that HPN is a general design and can be easily integrated into existing MARL algorithms (
e.g., [VDN](https://arxiv.org/pdf/1706.05296?ref=https://githubhelp.com)
, [QMIX](http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf)
, [MADDPG](https://proceedings.neurips.cc/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf)
, [MAPPO](https://arxiv.org/pdf/2103.01955?ref=https://githubhelp.com)) to boost the learning speed as well as the
converged performance. All parameters of HPN are simply trained end-to-end with backpropagation according to the
corresponding RL loss function.

## 2. Experimental Results

### 2.1 Applying HPN to fine-tuned VDN and QMIX.

![The Full Comparison of HPN with SOTA on SMAC](./doc/figure/exp_comparison_with_SOTA.png)

### 2.2 Comparison with baselines considering permutation invariance or permutation equivariance property

![Comparison with Related Baselines](./doc/figure/exp_comparison_with_baselines.png)

| Senarios       | Difficulty |               HPN-QMIX              |
|----------------|:----------:|:----------------------------------:|
| 8m_vs_9m           |  Hard |          **100%**          |
| 5m_vs_6m     |    Hard    |          **100%**          |
| 3s_vs_5z     |    Hard    |          **100%**          |
| bane_vs_bane |    Hard    |          **100%**          |
| 2c_vs_64zg   |    Hard    |          **100%**          |
| corridor       | Super Hard |          **100%**          |
| MMM2           | Super Hard |          **100%**          |
| 3s5z_vs_3s6z | Super Hard |**100%** |
| 27m_vs_30m   | Super Hard |          **100%**          |
| 6h_vs_8z     | Super Hard |  **98%**  |

## 3. How to use the code?

### Detailed Command line tool to reproduce all experimental results

**Run an experiment**

```shell
# For SMAC, take the 5m_vs_6m scenario for example.

# 5m_vs_6m
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=hpn_qmix --env-config=sc2 with env_args.map_name=5m_vs_6m obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# 3s5z_vs_3s6z
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=3s5z_vs_3s6z obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=4 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# 6h_vs_8z
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=6h_vs_8z obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=500000 batch_size=128 td_lambda=0.3 hpn_head_num=2

# 8m_vs_9m
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=8m_vs_9m obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# 3s_vs_5z
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=3s_vs_5z obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6 hpn_head_num=2

# corridor
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=corridor obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# MMM2
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=MMM2 obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# 27m_vs_30m
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=27m_vs_30m obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# 2c_vs_64zg
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=2c_vs_64zg obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6

# bane_vs_bane
CUDA_VISIBLE_DEVICES="0" python src/main.py --config=api_qmix --env-config=sc2 with env_args.map_name=bane_vs_bane obs_agent_id=True obs_last_action=False runner=parallel batch_size_run=8 buffer_size=5000 t_max=10050000 epsilon_anneal_time=100000 batch_size=128 td_lambda=0.6
```






