# README.md

## Overview

This folder contains **Kuhn Poker** benchmarks for:

* `gems.py` — **GEMS** (constant, hardened) with latent population growth and ABR‑TR.
* `psro.py` — **Vanilla PSRO** with **PPO** best‑responses.
* `apsro.py` — **A‑PSRO** (PPO candidates + PPO oracles).
* `alphapsro.py` — **α‑PSRO** (PPO BRs with α‑Rank meta solver).
* `epsro.py` — **EPSRO** (URR + deterministic meta strategy optimization).
* `neupl.py` — **NeuPL** (Conditional network population).
* `p2psro.py` — **P2SRO** (Pipeline PSRO).

Each script supports **multi‑seed** runs and writes CSV logs you can aggregate across experiments. Plotting can be disabled for batch runs.

> These scripts are standalone for Kuhn Poker — **no PettingZoo** is used here.

---

## 1) Environment & Dependencies

* **Python:** 3.11.9
* **Exact library versions used in our runs:**

  * `torch==2.8.0+cu128`
  * `numpy==2.1.3`
  * `matplotlib==3.10.0`

Install (CPU example):

```bash
pip install numpy==2.1.3 matplotlib==3.10.0 tqdm
# PyTorch: pick your platform-specific wheel
# Example (CUDA 12.8 builds):
pip install torch==2.8.0+cu128 --index-url https://download.pytorch.org/whl/cu128
```

> If CUDA is unavailable, install the CPU wheel (no `+cu128`).

---

## 2) Conventions

* All scripts accept `--seeds` as a **comma-separated string** (e.g., `"0,1,2,3,4"`).
* CSV outputs are placed under `--outdir` with a filename prefix `--csv_base`.
* Determinism: Scripts seed RNGs per seed; full determinism on GPU may require setting CUDA deterministic flags manually.
* Plots: Use `--no_plots` (or `--no-plot`) to suppress `matplotlib` windows during batch runs.

---

## 3) Running GEMS‑OMWU (`gems.py`)

**Description:** GEMS with optimistic MWU meta, constant‑work ABR‑TR, and guarded numerics.

**Key args:**

* Meta: `--eta`, `--eta_sched {const,sqrt,harmonic}`, `--ema`
* Oracle replacement: `--pool_mut`, `--pool_rand`, `--replace {least_mass,worst_ev}`
* ABR‑TR: `--abr_steps`, `--abr_lr`, `--beta_kl`, `--tau`
* Guardrails: `--clip_grad`, `--logit_cap`, `--prob_eps`, `--mwu_grad_cap`
* I/O: `--outdir`, `--csv_base`, `--device {auto,cpu,cuda}`, `--seeds`, `--no_plots`

**Example:**

```bash
python gems.py \
  --iters 40 --kmax 8 \
  --eta 0.08 --eta_sched harmonic --ema 0.0 \
  --pool_mut 2 --pool_rand 1 --replace least_mass \
  --abr_steps 30 --abr_lr 5e-4 --beta_kl 1e-2 --tau 1.0 \
  --clip_grad 1.0 --logit_cap 50.0 --prob_eps 1e-6 \
  --outdir runs/gems --csv_base gems_kuhn_const \
  --device auto --seeds "0,1,2,3,4" --no_plots
```

---

## 4) Running PSRO (`psro.py`)

**Description:** PSRO with PPO best‑responses and explicit LR.

**Key args:**

* Outer/meta: `--iters`, `--meta_loops`, `--eta`, `--eta_sched`, `--kmax`
* PPO BRs: `--ppo_rollouts`, `--ppo_epochs`, `--ppo_batch`, `--ppo_lr`, `--clip`, `--ent_beta`, `--gamma`, `--gae_lambda`, `--max_grad_norm`
* Numerics: `--prob_eps`, `--logit_cap`
* I/O: `--outdir`, `--csv_base`, `--seeds`, `--no_plots`

**Example:**

```bash
python psro.py \
  --iters 40 --meta_loops 200 --eta 0.25 --eta_sched harmonic --kmax 0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/psro --csv_base psro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 5) Running A‑PSRO (`apsro.py`)

**Description:** A‑PSRO with PPO candidates/oracles and advantage thresholding.

**Key args:**

* Outer/meta: `--iters`, `--meta_loops`, `--eta`, `--eta_sched`, `--kmax`, `--adv_thresh`
* PPO: `--ppo_rollouts`, `--ppo_epochs`, `--ppo_batch`, `--ppo_lr`, `--clip`, `--ent_beta`, `--gamma`, `--gae_lambda`, `--max_grad_norm`
* Numerics/I‑O: as above

**Example:**

```bash
python apsro.py \
  --iters 40 --meta_loops 200 --eta 0.25 --eta_sched harmonic --kmax 0 \
  --adv_thresh 0.0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/apsro --csv_base apsro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 6) Running α‑PSRO (`alphapsro.py`)

**Description:** PSRO variant with **α‑Rank** meta solver.

**Key args:**

* Outer: `--iters`, `--alpha` (selection intensity), `--kmax`
* PPO BRs: same as PSRO
* Numerics/I‑O: as above

**Example:**

```bash
python alphapsro.py \
  --iters 40 --alpha 10.0 --kmax 0 \
  --ppo_rollouts 4000 --ppo_epochs 10 --ppo_batch 512 --ppo_lr 3e-4 \
  --clip 0.2 --ent_beta 1e-3 --gamma 1.0 --gae_lambda 0.95 --max_grad_norm 1.0 \
  --prob_eps 1e-6 --logit_cap 50.0 \
  --outdir runs/alphapsro --csv_base alpha_psro_kuhn_ppo_br --seeds "0,1,2,3,4" --no_plots
```

---

## 7) Running EPSRO (`epsro.py`)

**Description:** EPSRO (URR + deterministic meta strategy optimization).

**Key args:**

* Outer: `--iters`, `--kmax`, `--pipeline_levels`, `--plateau_window`
* URR/Opt: `--urr_steps`, `--theta_lr`, `--beta_lr`, `--beta_sched`, `--theta_sched`
* Trust Region: `--beta_kl`, `--beta_temp`, `--entropy_coef`
* Numerics/I‑O: as above

**Example:**

```bash
python epsro.py \
  --iters 40 --kmax 8 \
  --urr_steps 200 --theta_lr 5e-4 --beta_lr 0.1 \
  --pipeline_levels 3 \
  --outdir runs/epsro --csv_base epsro_kuhn \
  --device auto --seeds "0,1,2,3,4" --no_plots
```

---

## 8) Running NeuPL (`neupl.py`)

**Description:** NeuPL with a conditional network population.

**Key args:**

* Outer: `--iters`, `--kmax`, `--meta_loops`, `--eta`, `--zdim`
* ABR: `--abr_steps`, `--abr_lr`, `--beta_kl`
* Numerics/I‑O: as above

**Example:**

```bash
python neupl.py \
  --iters 40 --kmax 8 \
  --meta_loops 200 --eta 0.1 --zdim 16 \
  --abr_steps 50 --abr_lr 5e-4 \
  --outdir runs/neupl --csv_base neupl_kuhn \
  --device auto --seeds "0,1,2,3,4" --no_plots
```

---

## 9) Running P2SRO (`p2psro.py`)

**Description:** Pipeline PSRO (P2SRO) with parallel active levels and fictitious play meta solver.

**Key args:**

* Outer: `--iters`, `--kmax`, `--levels`, `--meta_every`
* BR: `--br_steps`, `--br_lr`
* Freezing: `--freeze_every`, `--min_fixed`, `--plateau_check_every`
* Meta Solver: `--fp_iters`, `--fp_smooth`
* Numerics/I‑O: as above

**Example:**

```bash
python p2psro.py \
  --iters 40 --kmax 8 \
  --levels 4 --br_steps 50 --br_lr 5e-4 \
  --fp_iters 400 --meta_every 10 \
  --outdir runs/p2psro --csv_base p2psro_kuhn \
  --device auto --seeds "0,1,2,3,4" --no_plots
```

---

## 10) Outputs

Each script writes per‑seed or per‑run CSVs under `--outdir` with prefix `--csv_base`, e.g.:

```
runs/
  gems/
    gems_kuhn_const_seed0.csv
    ...
  psro/
    psro_kuhn_ppo_br_seed0.csv
    ...
```

Columns typically include iteration counters, exploitability/EV, BR losses, and timing (depends on your implementation). Use your analysis scripts to aggregate across seeds.

---

## 11) Reproducibility

1. Use **Python 3.11.9**.
2. Install the **exact** library versions listed above.
3. Run the provided example commands unchanged.
4. Keep `--seeds` and `--eta_sched` consistent with our defaults.

---

## 12) Troubleshooting

* **CUDA errors / missing GPU:** Add `--device cpu` (for scripts that support it) or install the correct CUDA wheel for `torch`.
* **Plots popping up in batch runs:** Pass `--no_plots` / `--no-plot` (script‑specific flag name).
* **CSV empty or missing:** Check stdout for exceptions; ensure `--outdir` exists and is writable.
* **Different results vs paper:** Verify `--eta_sched`, `--ppo_lr`, and `--kmax`; these strongly affect convergence.
