{"url":"https://www.perplexity.ai/search/7c29e5d3-bda1-4894-8662-72a53f8cade9","title":"Perplexity","description":"Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.","authors":[],"published_date":null,"domain":"perplexity.ai","is_paywall":false,"is_cached":false,"content":"### Dates and Benchmarks\n| Metric | Value |\n| :--- | :--- |\n| GPT-5.5 Pro Release Date | April 23, 2026 |\n| Terminal-Bench 2.0 Score | 82.7% |\n| OSWorld-Verified Score | 78.7% |\n| GDPval Score | 84.9% |\n| GPT-5.5 Pro Context Window | 1M-token |\n| LLM Go Evaluation Benchmark Accuracy | Below 35% |\n| Hallucination reduction (RAG) | 60–80% |\n\n### Practical Verdict: GPT-5.5 Pro + KataGo Tool Use\n| Aspect | Accuracy | Verifiable? |\n| :--- | :--- | :--- |\n| Win rate / score lead values | ✅ High (from KataGo JSON) | ✅ Yes — traceable to engine output |\n| Top move rankings | ✅ High (from KataGo) | ✅ Yes |\n| Territory ownership descriptions | ✅ Moderate–High (from ownership map) | ✅ Partial |\n| Strategic/conceptual explanations | ⚠️ Low–Moderate | ❌ No — LLM-generated, not grounded |\n| Joseki/pattern identification | ⚠️ Unpredictable | ❌ No |\n| Explaining *why* KataGo prefers a move | ⚠️ Often plausible but unreliable | ❌ No |\n\n### KataGo Analysis Engine Outputs (JSON-based)\n| Field | Meaning | Use in Explanation |\n| :--- | :--- | :--- |\n| `rootInfo.winrate` | Win probability for current player | \"Black wins in 87% of continuations\" |\n| `rootInfo.scoreLead` | Predicted point lead | \"Black leads by ~5.8 points\" |\n| `moveInfos[0].move` | Best move | \"KataGo's top choice is Q4\" |\n| `moveInfos[].order` | Rank of each candidate | Compare top-3 candidates |\n| `moveInfos[].visits` | Search depth per move | Confidence signal |\n| `moveInfos[].pv` | Principal variation | \"Expected continuation: Q4, R5, P5…\" |\n| `moveInfos[].prior` | Raw policy prior | How \"obvious\" the move is |\n| `ownership[]` | 361-length array, -1 to +1 | Territory map |\n| `ownershipStdev[]` | Uncertainty per intersection | Contested regions |\n\n### Architecture Variants\n| Approach | Effort | Explanation Quality | Verifiability |\n| :--- | :--- | :--- | :--- |\n| GPT-5.5 + KataGo MCP (prompt only) | Low (hours) | Moderate; explanation is improvised | Partial (numbers grounded, concepts not) |\n| SFT on KataGo analysis data (Tasks 1–2) | Medium (days) | Good on quantitative reasoning | High for evaluation claims |\n| Full Mastermind-Go 4-task SFT | High (weeks) | Good; concepts from Go books | Moderate for strategic explanations |\n| Full SFT + DPO with dan-level feedback | Very high (months) | Strong; aligned with expert understanding | High if citation-grounding is trained |\n| Hybrid CNN encoder (frozen KataGo trunk) + LLM decoder | Research-level | Potentially highest; model \"sees\" the board | Requires interpretability work |\n\n### Board Representation Symbols\n*   `#`: Black stone\n*   `o`: White stone\n*   `•`: Empty intersection\n*   Coordinates: Columns A–T, Rows 1–19 (A1 = bottom-left)\n*   Move indices: `#(3)` (Black's 3rd stone), `o(1)` (White's 1st stone)\n\n### Dataset Training Curriculum (Mastermind-Go)\n*   **Task 1: State-Transition Prediction:** 150,000+ examples (99.44% single-task accuracy, 96.08% multi-task accuracy).\n*   **Task 2: KataGo Position Analysis:** 138,693 samples (Score MAE: 1.74 points, Win Rate MAE: 4.49%).\n*   **Task 3: Natural Language Explanation:** 1,503 samples from Go books (Target: 50,000+).\n*   **Task 4: Full Integrated Decision:** Action accuracy improved from 78% to 90% with Chain-of-Thought (CoT).\n\n### Model Hyperparameters\n```python\n# Key hyperparameters (from Mastermind-Go):\n# LoRA: r=32, alpha=64\n# Learning rate: 5e-5 with cosine decay\n# Gradient clipping: norm=1.0\n# Dropout: 0.1\n# Hardware: 8x A100 GPUs\n# Optimizer: AdamW\n```\n\n### Loss Function\n$$L_\\theta = \\frac{1}{N} \\sum_{i=1}^{N} \\log_\\theta(Y_i \\mid X_i)$$\n\n### Structured Output Schema (JSON)\n```json\n{\n \"winrate\": 0.73, // KataGo API — not generated\n \"score_lead\": 5.2, // KataGo API — not generated\n \"best_move\": \"R16\", // KataGo API — not generated\n \"ownership_clusters\": { // derived from KataGo — not generated\n \"black_secure\": [\"lower-left corner\", \"right side\"],\n \"contested\": [\"upper-center\"],\n \"white_secure\": [\"upper-left corner\"]\n },\n \"explanation\": \"...\" // LLM-generated, must reference above\n}\n```\n\n### Training Stage Proportions (Recommended)\n*   Task 1 (Rule Mastery): 40%\n*   Task 2 (Quantitative Grounding): 35%\n*   Task 3 (Conceptual): 15%\n*   Task 4 (Integrated Reasoning): 10%\n\n### Component Specifications\n*   **Frozen KataGo Trunk:** `b18c384nbt`, tapped at block 14.\n*   **Trunk Output:** `[B, 361, 384]` spatial sequence.\n*   **Q-Former Bridge:** 32 learnable query vectors, output `[B, 32, 4096]` tokens.\n*   **Scalar Prefix:** ~64 structured tokens.\n*   **LLM Decoders:** Qwen-2.5-7B, LLaMA-3.1-8B, Mistral-7B, Mixtral-8x7B, Gemma-7B, DeepSeek V4.\n\n### Implementation Checklist\n1.  Set up KataGo analysis engine with GPU; configure `analysis_example.cfg`.\n2.  Generate 200–500 KataGo self-play games (optimal + Top-p=0.4).\n3.  Build board serializer (text: `#`, `o`, `•` notation).\n4.  Extract Task 1 data: 150k state-transition examples.\n5.  Extract Task 2 data: 140k position-analysis examples (`includeOwnership: true`).\n6.  Curate Task 3 data: 5k–50k board+commentary pairs.\n7.  Build `Count()` tool function for territory counting.\n8.  Implement citation-grounded training template.\n9.  Fine-tune base LLM (≥7B) with LoRA.\n10. Generate preference pairs for DPO; recruit dan-level annotators.\n11. Run DPO alignment pass.\n12. Build structured output validator.\n13. Evaluate: Score MAE, Win Rate MAE, explanation Rouge-L.\n\n### Documented Artifacts and Filenames\n*   `analysis_example.cfg` (KataGo configuration file)\n*   `Mastermind-Go` (Pipeline/Model name)\n*   `Mastermind-Dou` (Framework/Model name)\n*   `ResTNet` (Architecture name)\n*   `KataGo MCP server` (Model Context Protocol tool)\n*   `Terminal-Bench 2.0` (Benchmark)\n*   `OSWorld-Verified` (Benchmark)\n*   `GDPval` (Benchmark)\n*   `LLM Go Evaluation Benchmark` (Benchmark)\n*   `go-arch` (Reference name)","error":null,"saved_to":"/home/user/.pplx/search/fetch/0970630c.json"}
