# KataGo Win-Probability Claim-Consistency Experiment

## Setup Summary

- Train dataset: `/root/work/katago_winprob_20260429_train.jsonl`
- Eval dataset: `/root/work/katago_winprob_20260429_eval.jsonl`
- Train rows: `3373`
- Eval rows: `375`
- Variants: `lm_only, no_consistency_loss, rationale_only, full_consistency, random_consistency`
- Batch size: `32`
- Epochs: `10`
- LR: `0.0003`
- Model: `d_model=256`, `layers=4`, `heads=8`, `d_ff=1024`
- Max sequence length: `256`
- Consistency weight: `0.5`

## Results

| variant             | token_acc | claim_bin_acc | scalar_mse | mae_winprob | pearson_r_winprob | spearman_r_winprob | cfact_cls_follows_swap | cfact_cls_follows_orig | cfact_scalar_mse_to_swap |
| ------------------- | --------- | ------------- | ---------- | ----------- | ----------------- | ------------------ | ---------------------- | ---------------------- | ------------------------ |
| lm_only             | 0.477475  | 0.016000      | 0.160649   | 0.378542    | 0.361976          | 0.487018           | 0.101562               | 0.105469               | 0.158048                 |
| no_consistency_loss | 0.478094  | 0.250667      | 0.001615   | 0.026827    | 0.995273          | 0.931984           | 0.203125               | 0.207031               | 0.002952                 |
| rationale_only      | 0.452770  | 0.786667      | 0.180746   | 0.396172    | 0.153823          | 0.188011           | 0.519531               | 0.144531               | 0.175504                 |
| full_consistency    | 0.454690  | 0.810667      | 0.002049   | 0.027997    | 0.994257          | 0.917647           | 0.546875               | 0.144531               | 0.003664                 |
| random_consistency  | 0.467776  | 0.160000      | 0.001658   | 0.028389    | 0.995102          | 0.935408           | 0.167969               | 0.101562               | 0.003342                 |

## Variant Notes

- `lm_only`: Pure LM baseline: should model commentary tokens but usually underperforms on calibrated oracle claims.
- `no_consistency_loss`: LM + scalar baseline: can learn continuous win probability without directly tying rationale-pooled states to the bin claim.
- `rationale_only`: LM + rationale consistency only: strongest evidence that rationale-pooled hidden states alone can support the inline claim.
- `full_consistency`: Full objective: if this wins on bin accuracy and scalar quality, the shared hidden-state mechanism is working end to end.
- `random_consistency`: Control variant behavior.

## Interpretation Notes

- The earlier synthetic generated-rationale scalar experiment showed perfect claim-bin accuracy and perfect orig-following counterfactual behavior for `rationale_only` and `full_consistency`.
- This KataGo version is a stronger domain test because the language model emits both commentary and an inline claim while only the claim is supervised by an oracle.
- FEVER-from-scratch was a weaker bridge task because it supervised evidence classification rather than oracle-graded generated claims.
