
# Encouraging metric-aware diversity in contrastive representation space

>In cooperative Multi-Agent Reinforcement Learning (MARL), agents that share policy network parameters often learn similar behaviors, which hinders effective exploration and can lead to suboptimal cooperative policies. Recent advances have attempted to promote multi-agent diversity by leveraging the Wasserstein distance to increase policy differences. However, these approaches overlook the fact that, initially, agents' policies are highly similar due to shared network parameters, causing the Wasserstein distance, designed to measure policy differences, to approach zero. As a result, these methods cannot effectively encourage diverse policies. To address this limitation, we propose Wasserstein Contrastive Diversity (WCD) exploration, a novel approach that promotes multi-agent diversity by maximizing the Wasserstein distance between the trajectory distributions of different agents in a latent representation space. To make the Wasserstein distance meaningful, we propose a novel next-step prediction method based on Contrastive Predictive Coding (CPC) to learn distinguishable trajectory representations. Additionally, we introduce an optimized kernel-based method to compute the Wasserstein distance more efficiently. Since the Wasserstein distance is inherently defined for two distributions, we extend it to support multiple agents, enabling diverse policy learning. Empirical evaluations across a variety of challenging multi-agent tasks demonstrate that WCD outperforms existing state-of-the-art methods, delivering superior performance and enhanced exploration.
## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```

## Run an experiment

To train the model(s) in the paper, run this command:

```train
python3 src/main.py --config=wcd_smac_parallel --env-config=sc2
```
