# README.md

## Overview

This folder contains **Kuhn Poker** benchmarks for:

* `gems.py` — **GEMS‑OMWU** (constant, hardened) with latent population growth and ABR‑TR.
* `psro.py` — **Vanilla PSRO** with **PPO** best‑responses.
* `do.py` — **Double Oracle** (PPO BRs).
* `apsro.py` — **A‑PSRO** (PPO candidates + PPO oracles).
* `alphapsro.py` — **α‑PSRO** (PPO BRs with α‑Rank meta solver).

Each script supports **multi‑seed** runs and writes CSV logs you can aggregate across experiments. Plotting can be disabled for batch runs.

> These scripts are standalone for Kuhn Poker — **no PettingZoo** is used here.

---

## 1) Environment & Dependencies

* **Python:** 3.11.9
* **Exact library versions used in our runs:**

  * `torch==2.8.0+cu128`
  * `numpy==2.1.3`
  * `matplotlib==3.10.0`

Install (CPU example):

```bash
pip install numpy==2.1.3 matplotlib==3.10.0 tqdm
# PyTorch: pick your platform-specific wheel
# Example (CUDA 12.8 builds):
pip install torch==2.8.0+cu128 --index-url https://download.pytorch.org/whl/cu128
```

> If CUDA is unavailable, install the CPU wheel (no `+cu128`).

---

## 2) Conventions

* All scripts accept `--seeds` as a **comma-separated string** (e.g., `"0,1,2,3,4"`).
* CSV outputs are placed under `--outdir` with a filename prefix `--csv_base`.
* Determinism: Scripts seed RNGs per seed; full determinism on GPU may require setting CUDA deterministic flags manually.
* Plots: Use `--no_plots` (or `--no-plot`) to suppress `matplotlib` windows during batch runs.

---

## 3) Running GEMS‑OMWU (`gems.py`)

**Description:** GEMS with optimistic MWU meta, constant‑work ABR‑TR, and guarded numerics.

**Key args:**

* Meta: `--eta`, `--eta_sched {const,sqrt,harmonic}`, `--ema`
* Oracle replacement: `--pool_mut`, `--pool_rand`, `--replace {least_mass,worst_ev}`
* ABR‑TR: `--abr_steps`, `--abr_lr`, `--beta_kl`, `--tau`
* Guardrails: `--clip_grad`, `--logit_cap`, `--prob_eps`, `--mwu_grad_cap`
* I/O: `--outdir`, `--csv_base`, `--device {auto,cpu,cuda}`, `--seeds`, `--no_plots`

**Example:**

```bash
python gems.py \
  --iters 40 --kmax 8 \
  --eta 0.08 --eta_sched harmonic --ema 0.0 \
  --pool_mut 2 --pool_rand 1 --replace least_mass \
  --abr_steps 30 --abr_lr 5e-4 --beta_kl 1e-2 --tau 1.0 \
  --clip_grad 1.0 --logit_cap 50.0 --prob_eps 1e-6 \
  --outdir runs/gems --csv_base gems_kuhn_const \
  --device auto --seeds "0,1,2,3,4" --no_plots
```

---

## 4) Running PSRO (`psro.py`)

**Description:** PSRO with PPO best‑responses and explicit LR.

**Key args:**

* Outer/meta: `--iters`, `--meta_loops`, `--eta`, `--eta_sched`, `--kmax`
* PPO BRs: `--ppo_rollouts`, `--ppo_epochs`, `--ppo_batch`, `--ppo_lr`, `--clip`, `--ent_beta`, `--gamma`, `--gae_lambda`, `--max_grad_norm`
* Numerics: `--prob_eps`, `--logit_cap`
* I/O: `--outdir`, `--csv_base`, `--seeds`, `--no_plots`

**Example:**

```bash
python psro.py \
  --iters 40 --meta_loops 200 --eta 0.25 --eta_sched harmonic --kmax 0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/psro --csv_base psro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 5) Running Double Oracle (`do.py`)

**Description:** Classic DO loop with acceptance threshold.

**Key args:** `--iters`, `--meta_loops`, `--eta`, `--eta_sched`, `--kmax`, `--tol`, numerics/I‑O same style.

**Example:**

```bash
python do.py \
  --iters 40 --meta_loops 300 --eta 0.3 --eta_sched harmonic --kmax 0 \
  --prob_eps 1e-12 --logit_cap 50.0 \
  --outdir runs/do --csv_base double_oracle_kuhn --seeds "0,1,2,3,4" --no_plots
```

---

## 6) Running A‑PSRO (`apsro.py`)

**Description:** A‑PSRO with PPO candidates/oracles and advantage thresholding.

**Key args:**

* Outer/meta: `--iters`, `--meta_loops`, `--eta`, `--eta_sched`, `--kmax`, `--adv_thresh`
* PPO: `--ppo_rollouts`, `--ppo_epochs`, `--ppo_batch`, `--ppo_lr`, `--clip`, `--ent_beta`, `--gamma`, `--gae_lambda`, `--max_grad_norm`
* Numerics/I‑O: as above

**Example:**

```bash
python apsro.py \
  --iters 40 --meta_loops 200 --eta 0.25 --eta_sched harmonic --kmax 0 \
  --adv_thresh 0.0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/apsro --csv_base apsro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 7) Running α‑PSRO (`alphapsro.py`)

**Description:** PSRO variant with **α‑Rank** meta solver.

**Key args:**

* Outer: `--iters`, `--alpha` (selection intensity), `--kmax`
* PPO BRs: same as PSRO
* Numerics/I‑O: as above

**Example:**

```bash
python alphapsro.py \
  --iters 40 --alpha 10.0 --kmax 0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/alphapsro --csv_base alpha_psro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 8) Outputs

Each script writes per‑seed or per‑run CSVs under `--outdir` with prefix `--csv_base`, e.g.:

```
runs/
  gems/
    gems_kuhn_const_seed0.csv
    ...
  psro/
    psro_kuhn_ppo_br_seed0.csv
    ...
```

Columns typically include iteration counters, exploitability/EV, BR losses, and timing (depends on your implementation). Use your analysis scripts to aggregate across seeds.

---

## 9) Reproducibility

1. Use **Python 3.11.9**.
2. Install the **exact** library versions listed above.
3. Run the provided example commands unchanged.
4. Keep `--seeds` and `--eta_sched` consistent with our defaults.

---

## 10) Troubleshooting

* **CUDA errors / missing GPU:** Add `--device cpu` (for `gems.py`) or install the correct CUDA wheel for `torch`.
* **Plots popping up in batch runs:** Pass `--no_plots` / `--no-plot` (script‑specific flag name).
* **CSV empty or missing:** Check stdout for exceptions; ensure `--outdir` exists and is writable.
* **Different results vs paper:** Verify `--eta_sched`, `--ppo_lr`, and `--kmax`; these strongly affect convergence.
