# README.md

## Overview

This repository contains implementations of **GEMS** and multiple **PSRO-based algorithms** (PSRO, Double Oracle, A-PSRO, and Alpha-PSRO) for the **Deceptive Messages** environment. This environment is a simplified sender-receiver game with a fixed number of arms and message dimensions.

The provided scripts allow you to:

* Run **GEMS** with latent-space generative modeling.
* Run classic **PSRO**, **Double Oracle**, **A-PSRO**, and **Alpha-PSRO** baselines.
* Collect results over multiple seeds with CSV logging for reproducibility.
* Optionally disable plotting for batch runs.

Each algorithm is implemented as a standalone Python file:

* `gems.py` — GEMS implementation.
* `psro.py` — Vanilla PSRO.
* `do.py` — Double Oracle.
* `apsro.py` — A-PSRO.
* `alphapsro.py` — Alpha-PSRO.

---

## 1. Installation

### Requirements

* **Python Version:** 3.11.9
* **Dependencies and versions used in our experiments:**

  * `torch==2.8.0+cu128`
  * `numpy==2.1.3`
  * `imageio==2.37.0`
  * `matplotlib==3.10.0`

Install the dependencies:

```bash
pip install torch==2.8.0+cu128 numpy==2.1.3 imageio==2.37.0 matplotlib==3.10.0 tqdm
```

There is **no PettingZoo dependency** here because the Deceptive Messages environment is entirely self-contained.

---

## 2. Running GEMS

Run GEMS on the Deceptive Messages environment:

```bash
python gems.py \
  --K 5 \
  --M 3 \
  --iters 6 \
  --seeds 0 1 2 3 4 \
  --log-csv results/gems_run.csv
```

Key arguments:

| Argument       | Description                                  | Default   |
| -------------- | -------------------------------------------- | --------- |
| `--K`          | Number of arms                               | 5         |
| `--M`          | Message dimension                            | 3         |
| `--iters`      | Outer PSRO iterations                        | 6         |
| `--seeds`      | Space-separated RNG seeds                    | 0 1 2 3 4 |
| `--log-csv`    | CSV filename for logging                     | `None`    |
| `--log-latent` | Enable latent-space geometry metrics logging | Disabled  |

Example with latent logging:

```bash
python gems.py --log-latent --save-latent-npz latent_metrics.npz
```

---

## 3. Running PSRO

Run the vanilla PSRO baseline:

```bash
python psro.py \
  --K 5 \
  --M 3 \
  --iters 6 \
  --seeds 0 1 2 3 4 \
  --log-csv results/psro_avg.csv
```

Key arguments:

| Argument          | Description                 | Default  |
| ----------------- | --------------------------- | -------- |
| `--eval-episodes` | Rollouts per payoff entry   | 400      |
| `--oracle-epochs` | PPO epochs for each oracle  | 200      |
| `--ppo-batch`     | PPO batch size              | 256      |
| `--no_plot`       | Disable matplotlib plotting | Disabled |

---

## 4. Running Double Oracle (DO)

Run the Double Oracle baseline:

```bash
python do.py --iters 8 --seeds 0 1 2 3 4 --log-csv results/do_avg.csv
```

| Argument         | Description                        | Default |
| ---------------- | ---------------------------------- | ------- |
| `--tol`          | Improvement threshold to accept BR | 1e-3    |
| `--log-interval` | Print PPO progress every N epochs  | 0 (off) |

---

## 5. Running A-PSRO

Run A-PSRO with exploration repeats:

```bash
python apsro.py --iters 6 --seeds 0 1 2 3 4 --log-csv results/apsro_avg.csv
```

| Argument        | Description                                | Default |
| --------------- | ------------------------------------------ | ------- |
| `--fp-iters`    | Fictitious play iterations for meta-solver | 200     |
| `--num-repeats` | Exploration repeats per outer iteration    | 3       |
| `--lookahead-d` | Lookahead interpolation coefficient        | 0.1     |

---

## 6. Running Alpha-PSRO

Run Alpha-PSRO with Alpha-Rank meta-strategy computation:

```bash
python alphapsro.py --iters 6 --seeds 0 1 2 3 4 --log-csv results/alphapsro_avg.csv
```

| Argument                | Description                    | Default |
| ----------------------- | ------------------------------ | ------- |
| `--alpha-rank-strength` | Alpha-Rank selection intensity | 8.0     |
| `--power-iters`         | Alpha-Rank power iterations    | 2000    |
| `--alpha-rank-tol`      | Convergence threshold          | 1e-12   |

---

## 7. Output

All algorithms produce CSV files for easy analysis. Example structure:

```
results/
  gems_run.csv
  psro_avg.csv
  do_avg.csv
  apsro_avg.csv
  alphapsro_avg.csv
```

Each CSV contains per-iteration metrics such as reward, exploitability, and other evaluation statistics.

---

## 8. Reproducibility Checklist

To exactly replicate our results:

1. Use Python 3.11.9.
2. Install dependencies with the exact versions listed above.
3. Run with the provided default hyperparameters.
4. Aggregate results across seeds using the `--log-csv` outputs.

---

## 9. Troubleshooting

* **Permission errors:** Make sure you have write permissions for the `results/` folder.
* **Blank CSV files:** Check the console logs for runtime errors.
* **Plots not showing:** Ensure `matplotlib` is correctly installed or use `--no-plot` to disable plotting.
* **GPU issues:** Use `--device cpu` to force CPU mode if CUDA is unavailable.
