# KataGo Win-Probability Claim-Consistency Experiment

This project ports the repo's synthetic "generated rationale + scalar claim + verifier" recipe into a Go setting. The model reads a Go position, autoregressively generates short commentary, emits an inline win-probability claim token, and is trained so that the rationale span shares hidden-state structure with the claim.

## Concept

Each training example uses a single autoregressive sequence:

```text
[BOS] <position tokens...> [RAT] <rationale tokens...> [CLAIM] V7 [EOS]
```

The model is a small decoder-only transformer trained with:

- LM loss over the whole sequence.
- Scalar regression loss from the `[CLAIM]` hidden state to predict continuous win probability.
- Consistency/bin loss from mean-pooled rationale hidden states to predict the win-probability bin.

KataGo acts as the oracle for the claim. Training data should already contain `win_prob` and `win_prob_bin`, so training does not need live engine calls. Optional preprocessing helpers can build JSONL from SGFs and query a local KataGo engine.

## Why This Matters

The broader hypothesis is domain-agnostic:

- Generated prose and verifiable claims can share hidden states.
- A domain oracle can supervise only the claims.
- A consistency loss can drag the prose representation into alignment through the shared residual stream.

This Go setup is a cleaner test than open-ended factual QA because the target is deterministic and numeric.

## Relation To Prior Experiments

- The synthetic generated-rationale scalar experiment in `scalarverifier/` showed the strongest result so far: `rationale_only` and `full_consistency` reached perfect claim-bin accuracy and perfect counterfactual orig-following in that setup.
- The FEVER from-scratch experiment was a weaker bridge task because it focused on evidence classification rather than oracle-supervised generated claims, so it did not isolate the intended mechanism as cleanly.

## Files

| File | Purpose |
|------|---------|
| `katago_winprob_experiment.py` | Core logic: dataset loading, vocab, model, training, evaluation, SGF/KataGo preprocessing |
| `run_katago_winprob_gpu.py` | CLI runner with GPU checks and smoke-test support |
| `modal_katago_winprob_run.py` | Modal A10G launcher patterned after the existing repo launchers |
| `README.md` | This guide |

## Dataset Format

Each JSONL row is expected to look like:

```json
{
  "id": "game123_move87",
  "board_size": 19,
  "to_move": "B",
  "rules": "japanese",
  "komi": 6.5,
  "stones": {
    "black": ["D4", "Q16", "K10"],
    "white": ["C3", "R17", "K11"]
  },
  "position_tokens": ["SZ19", "TM_B", "KOMI_6_5", "RULE_JAPANESE", "B_D4", "B_Q16", "B_K10", "W_C3"],
  "rationale_text": "Black is ahead because the center is thick and White's top side is unsettled.",
  "win_prob": 0.73,
  "win_prob_bin": 7
}
```

Notes:

- `win_prob` is treated as a scalar in `[0, 1]`.
- `win_prob_bin` is expected to match `V0` to `V9`.
- If `position_tokens` are absent, the loader rebuilds them from `board_size`, `to_move`, `komi`, `rules`, and `stones`.
- If `rationale_text` is absent, the loader falls back to a simple template.

## Position Tokenization

The v1 tokenizer is symbolic and intentionally simple:

- `SZ19`
- `TM_B`
- `KOMI_6_5`
- `RULE_JAPANESE`
- `B_D4`
- `W_Q16`

If the position token list is too long, the implementation preserves the first few metadata tokens and truncates later stone tokens first. This keeps the sequence within `max_seq_len` while retaining board context.

## Variants

The experiment implements the same five variants as the synthetic reference:

- `lm_only`: LM loss only.
- `no_consistency_loss`: LM loss + scalar loss.
- `rationale_only`: LM loss + consistency/bin loss.
- `full_consistency`: LM loss + scalar loss + consistency/bin loss.
- `random_consistency`: LM loss + scalar loss + consistency head trained on random bins.

These are the critical ablations for separating genuine coupling from accidental behavior.

## Metrics

Per variant, the project saves:

- `token_acc`
- `claim_bin_acc`
- `scalar_mse`
- `cfact_cls_follows_swap`
- `cfact_cls_follows_orig`
- `cfact_scalar_mse_to_swap`
- `pearson_r_winprob`
- `spearman_r_winprob`
- `mae_winprob`

Interpretation:

- High `claim_bin_acc` means the rationale-pooled hidden states support the inline claim bin.
- Low `scalar_mse` and low `mae_winprob` mean the scalar head from the claim position is calibrated.
- The counterfactual metrics check what happens when rationale text is swapped across positions with different bins.

## Counterfactual Evaluation

For sampled example `A`, the code picks a different-bin example `B` and forms:

- position from `A`
- rationale from `B`
- claim placeholder location from `A`

Then it checks whether the rationale-pooled classifier follows:

- `A`'s original bin
- `B`'s swapped rationale bin

This mirrors the structure of the synthetic experiment and keeps the evaluation modular so later semantics can be changed.

## Preprocessing From SGF + KataGo

`katago_winprob_experiment.py` includes `preprocess-sgf`, which can:

- read SGF files,
- sample positions every `N` moves,
- reconstruct stone lists and symbolic position tokens,
- optionally query a local KataGo binary in GTP mode,
- write JSONL rows with `win_prob`, `win_prob_bin`, and a templated rationale.

If KataGo paths are not supplied, preprocessing falls back to a lightweight heuristic win-probability generator so the pipeline remains usable for smoke testing and scaffolding.

The KataGo helper is deliberately modest in scope. It uses `boardsize`, `komi`, `kata-set-rules`, `set_position`, and a lightweight analysis request, then parses root winrate as a first-pass preprocessing tool rather than a full engine integration.

## Usage

### 1. Install dependencies

```bash
pip install torch numpy pandas sgfmill
```

### 2. Build a dataset from SGFs

```bash
python katago/katago_winprob_experiment.py preprocess-sgf \
  --sgf-dir /path/to/sgfs \
  --output-path /path/to/katago_train.jsonl \
  --sample-every-n-moves 20 \
  --max-positions-per-game 8 \
  --katago-binary /path/to/katago \
  --katago-model /path/to/model.bin.gz \
  --katago-config /path/to/analysis.cfg
```

### 3. Run a CPU smoke test

This auto-generates a tiny mock dataset if no dataset paths are provided.

```bash
python katago/run_katago_winprob_gpu.py --smoke-test --cpu-smoke-ok
```

### 4. Run a full local GPU experiment

```bash
python katago/run_katago_winprob_gpu.py \
  --require-gpu \
  --train-path /path/to/train.jsonl \
  --eval-path /path/to/eval.jsonl \
  --batch-size 32 \
  --epochs 10 \
  --d-model 256 \
  --n-layers 4 \
  --n-heads 8 \
  --d-ff 1024 \
  --consistency-weight 0.5 \
  --output-csv katago_winprob_results.csv
```

### 5. Run preprocessing + training on Modal

The Modal launcher is designed for the repo's local `gtlreviews/` SGF corpus. It uploads that SGF directory into the Modal image, downloads KataGo assets at runtime, preprocesses SGFs into JSONL on Modal, splits train/eval, and then optionally runs training in the same job.

The launcher now defaults to a current official setup:

- KataGo release: `v1.16.4`
- Engine asset: `katago-v1.16.4-cuda12.8-cudnn9.8.0-linux-x64.zip`
- Network: `kata1-b28c512nbt-s12434156288-d5719330235`
- Config: official `analysis_example.cfg`
- Default visits in Modal preprocessing: `1000`

You can still override any of these via environment variables:

```bash
export OUTPUT_STEM=katago_gtlreviews_modal
export KATAGO_EXECUTABLE_URL="https://github.com/lightvector/KataGo/releases/download/v1.16.4/katago-v1.16.4-cuda12.8-cudnn9.8.0-linux-x64.zip"
export KATAGO_MODEL_URL="https://media.katagotraining.org/uploaded/networks/models/kata1/kata1-b28c512nbt-s12434156288-d5719330235.bin.gz"
export KATAGO_CONFIG_URL="https://raw.githubusercontent.com/lightvector/KataGo/master/cpp/configs/analysis_example.cfg"
export KATAGO_ANALYSIS_VISITS=1000
export KATAGO_MAX_GAMES=1000
export KATAGO_SAMPLE_EVERY_N_MOVES=40
export KATAGO_MAX_POSITIONS_PER_GAME=4
export RUN_TRAINING=1

python -m modal run katago/modal_katago_winprob_run.py
```

Important notes:

- `KATAGO_EXECUTABLE_URL` should point to a Linux KataGo release archive from the official `lightvector/KataGo` releases page.
- `KATAGO_MODEL_URL` should point to a `.bin.gz` network from `katagotraining.org`.
- `KATAGO_CONFIG_URL` should point to an analysis-engine config file such as the official `analysis_example.cfg`.
- The preprocessing code now uses KataGo's JSON `analysis` engine rather than GTP, which is the official mode intended for high-throughput batched analysis.
- The Modal manifest will include the preprocessing logs, the generated full/train/eval JSONL files, and, if `RUN_TRAINING=1`, the CSV/markdown experiment outputs too.
- If you only want dataset generation on Modal first, set `RUN_TRAINING=0`.

## What A Successful Result Looks Like

The synthetic reference showed a strong signature:

- `rationale_only` and `full_consistency` reached perfect claim-bin accuracy.
- Counterfactual classification strongly favored the original aligned state in that setup.

For the KataGo version, a promising result would usually look like:

- `full_consistency` matching or beating `no_consistency_loss` on scalar quality.
- `rationale_only` or `full_consistency` clearly outperforming `lm_only` on `claim_bin_acc`.
- Counterfactual metrics differing in a stable way between consistency-trained variants and the baselines.

## Engineering Notes

- Python 3.11
- PyTorch only, plus modest data dependencies
- Decoder-only transformer written from scratch
- Reproducible seeds
- CSV and markdown outputs
- No notebook dependency

The code is intentionally close in structure to `scalarverifier/generated_rationale_scalar_verifier_experiment.py` so the two projects can be compared side by side.
