### Dates and Benchmarks
| Metric | Value |
| :--- | :--- |
| GPT-5.5 Pro Release Date | April 23, 2026 |
| Terminal-Bench 2.0 Score | 82.7% |
| OSWorld-Verified Score | 78.7% |
| GDPval Score | 84.9% |
| GPT-5.5 Pro Context Window | 1M-token |
| LLM Go Evaluation Benchmark Accuracy | Below 35% |
| Hallucination reduction (RAG) | 60–80% |

### Practical Verdict: GPT-5.5 Pro + KataGo Tool Use
| Aspect | Accuracy | Verifiable? |
| :--- | :--- | :--- |
| Win rate / score lead values | ✅ High (from KataGo JSON) | ✅ Yes — traceable to engine output |
| Top move rankings | ✅ High (from KataGo) | ✅ Yes |
| Territory ownership descriptions | ✅ Moderate–High (from ownership map) | ✅ Partial |
| Strategic/conceptual explanations | ⚠️ Low–Moderate | ❌ No — LLM-generated, not grounded |
| Joseki/pattern identification | ⚠️ Unpredictable | ❌ No |
| Explaining *why* KataGo prefers a move | ⚠️ Often plausible but unreliable | ❌ No |

### KataGo Analysis Engine Outputs (JSON-based)
| Field | Meaning | Use in Explanation |
| :--- | :--- | :--- |
| `rootInfo.winrate` | Win probability for current player | "Black wins in 87% of continuations" |
| `rootInfo.scoreLead` | Predicted point lead | "Black leads by ~5.8 points" |
| `moveInfos[0].move` | Best move | "KataGo's top choice is Q4" |
| `moveInfos[].order` | Rank of each candidate | Compare top-3 candidates |
| `moveInfos[].visits` | Search depth per move | Confidence signal |
| `moveInfos[].pv` | Principal variation | "Expected continuation: Q4, R5, P5…" |
| `moveInfos[].prior` | Raw policy prior | How "obvious" the move is |
| `ownership[]` | 361-length array, -1 to +1 | Territory map |
| `ownershipStdev[]` | Uncertainty per intersection | Contested regions |

### Architecture Variants
| Approach | Effort | Explanation Quality | Verifiability |
| :--- | :--- | :--- | :--- |
| GPT-5.5 + KataGo MCP (prompt only) | Low (hours) | Moderate; explanation is improvised | Partial (numbers grounded, concepts not) |
| SFT on KataGo analysis data (Tasks 1–2) | Medium (days) | Good on quantitative reasoning | High for evaluation claims |
| Full Mastermind-Go 4-task SFT | High (weeks) | Good; concepts from Go books | Moderate for strategic explanations |
| Full SFT + DPO with dan-level feedback | Very high (months) | Strong; aligned with expert understanding | High if citation-grounding is trained |
| Hybrid CNN encoder (frozen KataGo trunk) + LLM decoder | Research-level | Potentially highest; model "sees" the board | Requires interpretability work |

### Board Representation Symbols
*   `#`: Black stone
*   `o`: White stone
*   `•`: Empty intersection
*   Coordinates: Columns A–T, Rows 1–19 (A1 = bottom-left)
*   Move indices: `#(3)` (Black's 3rd stone), `o(1)` (White's 1st stone)

### Dataset Training Curriculum (Mastermind-Go)
*   **Task 1: State-Transition Prediction:** 150,000+ examples (99.44% single-task accuracy, 96.08% multi-task accuracy).
*   **Task 2: KataGo Position Analysis:** 138,693 samples (Score MAE: 1.74 points, Win Rate MAE: 4.49%).
*   **Task 3: Natural Language Explanation:** 1,503 samples from Go books (Target: 50,000+).
*   **Task 4: Full Integrated Decision:** Action accuracy improved from 78% to 90% with Chain-of-Thought (CoT).

### Model Hyperparameters
```python
# Key hyperparameters (from Mastermind-Go):
# LoRA: r=32, alpha=64
# Learning rate: 5e-5 with cosine decay
# Gradient clipping: norm=1.0
# Dropout: 0.1
# Hardware: 8x A100 GPUs
# Optimizer: AdamW
```

### Loss Function
$$L_\theta = \frac{1}{N} \sum_{i=1}^{N} \log_\theta(Y_i \mid X_i)$$

### Structured Output Schema (JSON)
```json
{
 "winrate": 0.73, // KataGo API — not generated
 "score_lead": 5.2, // KataGo API — not generated
 "best_move": "R16", // KataGo API — not generated
 "ownership_clusters": { // derived from KataGo — not generated
 "black_secure": ["lower-left corner", "right side"],
 "contested": ["upper-center"],
 "white_secure": ["upper-left corner"]
 },
 "explanation": "..." // LLM-generated, must reference above
}
```

### Training Stage Proportions (Recommended)
*   Task 1 (Rule Mastery): 40%
*   Task 2 (Quantitative Grounding): 35%
*   Task 3 (Conceptual): 15%
*   Task 4 (Integrated Reasoning): 10%

### Component Specifications
*   **Frozen KataGo Trunk:** `b18c384nbt`, tapped at block 14.
*   **Trunk Output:** `[B, 361, 384]` spatial sequence.
*   **Q-Former Bridge:** 32 learnable query vectors, output `[B, 32, 4096]` tokens.
*   **Scalar Prefix:** ~64 structured tokens.
*   **LLM Decoders:** Qwen-2.5-7B, LLaMA-3.1-8B, Mistral-7B, Mixtral-8x7B, Gemma-7B, DeepSeek V4.

### Implementation Checklist
1.  Set up KataGo analysis engine with GPU; configure `analysis_example.cfg`.
2.  Generate 200–500 KataGo self-play games (optimal + Top-p=0.4).
3.  Build board serializer (text: `#`, `o`, `•` notation).
4.  Extract Task 1 data: 150k state-transition examples.
5.  Extract Task 2 data: 140k position-analysis examples (`includeOwnership: true`).
6.  Curate Task 3 data: 5k–50k board+commentary pairs.
7.  Build `Count()` tool function for territory counting.
8.  Implement citation-grounded training template.
9.  Fine-tune base LLM (≥7B) with LoRA.
10. Generate preference pairs for DPO; recruit dan-level annotators.
11. Run DPO alignment pass.
12. Build structured output validator.
13. Evaluate: Score MAE, Win Rate MAE, explanation Rouge-L.

### Documented Artifacts and Filenames
*   `analysis_example.cfg` (KataGo configuration file)
*   `Mastermind-Go` (Pipeline/Model name)
*   `Mastermind-Dou` (Framework/Model name)
*   `ResTNet` (Architecture name)
*   `KataGo MCP server` (Model Context Protocol tool)
*   `Terminal-Bench 2.0` (Benchmark)
*   `OSWorld-Verified` (Benchmark)
*   `GDPval` (Benchmark)
*   `LLM Go Evaluation Benchmark` (Benchmark)
*   `go-arch` (Reference name)