## Table 1

| Variable/Parameter | Definition/Formula |
| :--- | :--- |
| Completion Group | $o^{(1)}, o^{(2)}, \dots, o^{(G)}$ |
| Scalar Reward | $r^{(i)} = r(\text{board}, o^{(i)})$ |
| Mean Reward | $\bar{r} = \frac{1}{G}\sum_i r^{(i)}$ |
| Standard Deviation of Rewards | $\sigma_r = \text{std}(r^{(1)}, \dots, r^{(G)})$ |
| Group-Normalized Advantage | $A^{(i)} = \frac{r^{(i)} - \bar{r}}{\sigma_r + \epsilon}$ |

## Table 2

| Type | Structure/Sequence |
| :--- | :--- |
| Full Sequence | `board description + explanation tokens + [MOVE]` |
| Causal Chain | `... board … explanation …` |
| Move Context | `[prompt (board)] + [explanation]` |
| System Design Flow | `[BOARD AS TEXT] → [EXPLANATION TOKENS] → [MOVE TOKEN]` |
| Safe Bottleneck | `board text → transformer → explanation tokens → transformer → move (LM head)` |
| Broken Bottleneck Path 1 | `board → board encoder → board embedding → move head` |
| Broken Bottleneck Path 2 | `board text → transformer → explanation tokens` |
