## Table 1

| Metric | Value |
| :--- | :--- |
| GPT-5.5 Pro Release Date | April 23, 2026 |
| Terminal-Bench 2.0 Score | 82.7% |
| OSWorld-Verified Score | 78.7% |
| GDPval Score | 84.9% |
| GPT-5.5 Pro Context Window | 1M-token |
| LLM Go Evaluation Benchmark Accuracy | Below 35% |
| Hallucination reduction (RAG) | 60–80% |

## Table 2

| Aspect | Accuracy | Verifiable? |
| :--- | :--- | :--- |
| Win rate / score lead values | ✅ High (from KataGo JSON) | ✅ Yes — traceable to engine output |
| Top move rankings | ✅ High (from KataGo) | ✅ Yes |
| Territory ownership descriptions | ✅ Moderate–High (from ownership map) | ✅ Partial |
| Strategic/conceptual explanations | ⚠️ Low–Moderate | ❌ No — LLM-generated, not grounded |
| Joseki/pattern identification | ⚠️ Unpredictable | ❌ No |
| Explaining *why* KataGo prefers a move | ⚠️ Often plausible but unreliable | ❌ No |

## Table 3

| Field | Meaning | Use in Explanation |
| :--- | :--- | :--- |
| `rootInfo.winrate` | Win probability for current player | "Black wins in 87% of continuations" |
| `rootInfo.scoreLead` | Predicted point lead | "Black leads by ~5.8 points" |
| `moveInfos[0].move` | Best move | "KataGo's top choice is Q4" |
| `moveInfos[].order` | Rank of each candidate | Compare top-3 candidates |
| `moveInfos[].visits` | Search depth per move | Confidence signal |
| `moveInfos[].pv` | Principal variation | "Expected continuation: Q4, R5, P5…" |
| `moveInfos[].prior` | Raw policy prior | How "obvious" the move is |
| `ownership[]` | 361-length array, -1 to +1 | Territory map |
| `ownershipStdev[]` | Uncertainty per intersection | Contested regions |

## Table 4

| Approach | Effort | Explanation Quality | Verifiability |
| :--- | :--- | :--- | :--- |
| GPT-5.5 + KataGo MCP (prompt only) | Low (hours) | Moderate; explanation is improvised | Partial (numbers grounded, concepts not) |
| SFT on KataGo analysis data (Tasks 1–2) | Medium (days) | Good on quantitative reasoning | High for evaluation claims |
| Full Mastermind-Go 4-task SFT | High (weeks) | Good; concepts from Go books | Moderate for strategic explanations |
| Full SFT + DPO with dan-level feedback | Very high (months) | Strong; aligned with expert understanding | High if citation-grounding is trained |
| Hybrid CNN encoder (frozen KataGo trunk) + LLM decoder | Research-level | Potentially highest; model "sees" the board | Requires interpretability work |
