### How to reproduce the Soccer and Gomoku experiments

#### Dependencies

- `soccer.cpp`: The grid-world soccer env is implemented in C++ for speed. It needs to be compiled:

  ```bash
  c++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` soccer.cpp -o soccer`python3-config --extension-suffix`
  ```

- `gym_renju`: The modified Gomoku/Renju environment. Please check the README under folder `gym_renju/` for more info.

- openai/`gym`



### Gomoku

#### Step 1: Pretrain the Gomoku agents

Download some games from, e.g., https://www.renjuoffline.com/

Then in directory `gomoku/`, run `extract.py`; then`pretrain.py` multiple times to generate a diverse set of pretrained agents; or use `generate_init.py` to perturb the weights.



#### Step 2: Self-play

Bash commands to train the baseline self-play methods:

```bash
# baseline self-play (latest)
python train.py --env gomoku --method base --lr 0.001 --save result/gomoku/f2c1_base/0 --T 40 --batch 32 --test_T 40 --test_batch 32 --niter 40 --freeze 2 --anneal 20

# baseline self-play (random past)
python train.py --env gomoku --method baserand --lr 0.001 --save result/gomoku/f2c1_baserand/0 --T 40 --batch 32 --test_T 40 --test_batch 32 --niter 40 --freeze 2 --anneal 20

# baseline self-play (best past)
python train.py --env gomoku --method basebest --lr 0.001 --save result/gomoku/f2c1_basebest/0 --T 40 --batch 32 --test_T 40 --test_batch 32 --niter 40 --freeze 2 --anneal 20
```

Bash commands to train `Ours(6)`:

```bash
python train.py --env gomoku --method const --nagent 6 --lr 0.001 --save result/gomoku/f2c1_constv6/0 --T 40 --batch 32 --test_T 40 --test_batch 32 --niter 40 --freeze 2 --ninner 8 --anneal 20
```



#### Step 3: Evaluate through a tournament

```bash
python tour.py --config gomoku_f2c1 --env gomoku --T 40 --batch 100 --n 30
```

This will also compute the Elo scores.



### Soccer

Bash commands to train the baseline self-play methods:

```bash
# baseline self-play (latest)
python train.py --lr 0.1 --tabular --method base --save result/soccer/tab_base/0 --T 50 --test_T 50 --batch 32

# baseline self-play (random past)
python train.py --lr 0.1 --tabular --method baserand --save result/soccer/tab_baserand/0 --T 50 --test_T 50 --batch 32

# baseline self-play (best past)
python train.py --lr 0.1 --tabular --method basebest --save result/soccer/tab_basebest/0 --T 50 --test_T 50 --batch 32
```

Bash commands to train `Ours(6)`:

```bash
python train.py --method const --nagent 6 --lr 0.1 --tabular --save result/soccer/const6/0 --T 50 --test_T 50 --batch 32
```

Evaluation (compute Elo):

```bash
python tour.py --config soccer_tab --env soccer --T 40 --batch 20 --n 30
```

