{"url":"https://www.perplexity.ai/search/a1b9115c-06de-4fb1-8eb5-2bc9468ec18b","title":"Perplexity","description":"Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.","authors":[],"published_date":null,"domain":"perplexity.ai","is_paywall":false,"is_cached":false,"content":"### Filenames and Artifacts\n* `codex_pilot_spec_v3`\n* `codex/` (Repository Root)\n* `codex/README.md`\n* `codex/data/positions/`\n* `codex/data/katago_cache.jsonl`\n* `codex/data/splits/train.txt`\n* `codex/data/splits/test.txt`\n* `codex/go_engine/board.py`\n* `codex/go_engine/tools.py`\n* `codex/go_engine/ladder.py`\n* `codex/go_engine/territory.py`\n* `codex/go_engine/life_death.py`\n* `codex/boardmatcher_server/server.js`\n* `codex/boardmatcher_server/package.json`\n* `codex/boardmatcher_server/client.py`\n* `codex/katago/subprocess_manager.py`\n* `codex/katago/batch_analyzer.py`\n* `codex/katago/cache.py`\n* `codex/training/cold_start_sft.py`\n* `codex/training/grpo_train.py`\n* `codex/training/reward.py`\n* `codex/training/claim_parser.py`\n* `codex/training/counterfactual.py`\n* `codex/training/tool_env.py`\n* `codex/training/config.py`\n* `codex/evaluation/eval.py`\n* `codex/evaluation/concept_accuracy.py`\n* `codex/evaluation/tier3_accuracy.py`\n* `codex/evaluation/causal_eval.py`\n* `codex/prompts/pass1_system.txt`\n* `codex/prompts/pass2_system.txt`\n* `codex/prompts/tool_definitions.json`\n* `codex/preprocess/katago_batch.py`\n* `codex/preprocess/filter_positions.py`\n* `codex/preprocess/generate_sft_demos.py`\n* `codex/requirements.txt`\n\n### Code Blocks and Verbatim Snippets\n\n**Architecture Option 1 Flow**\n```text\nPass 1: board → concept schema\nPass 2: concepts → PVs / ownership / score / winrate\nPass 3: concept schema + Pass 2 predictions → natural language explanation\n```\n\n**Interleaved Prose and Claim Sequence Example**\n```text\nThe top-right corner is the critical area this move. White's group at R15\n<claim id=\"G1.in_atari\">true</claim> is in atari with\n<claim id=\"G1.liberties\">1</claim> liberty remaining. The ladder at S15\n<claim id=\"ladder_exists\">true</claim> works for Black\n<claim id=\"ladder_favor\">black</claim>, so White cannot escape by running.\nThis makes the position extremely sharp\n<claim id=\"position_sharpness\">sharp</claim> — Black stands to gain roughly\n6 points <claim id=\"best_move_score_delta\">6.0</claim> and holds a\n<claim id=\"global_black_winrate\">63</claim>% win rate.\n```\n\n**Board Representation**\n```text\nSide: Black | Komi: 7.5 | Move: 84 | Last: W R16\n A B C D E F G H J K L M N O P Q R S T\n19 . . . . . . . . . . . . . . . . . . .\n18 . . . B . . . . . . . . . . . . . . .\n...\n```\n\n**Inline Claim Tag Format**\n```text\n<claim id=\"FIELD_NAME\">VALUE</claim>\n```\n\n**Pass 1 System Prompt**\n```text\nYou are a Go analysis engine. Given a board position, explain your analysis in natural\nlanguage and assert structured claims using inline <claim> tags.\n\nYour explanation should proceed as follows:\n- Call tools to verify tactical and evaluative facts before asserting them.\n- Write one or a few sentences explaining your reasoning about a concept.\n- Immediately follow with a <claim id=\"field_name\">value</claim> tag asserting that concept.\n- Continue to the next concept.\n\nAvailable tools:\n- Go engine: count_liberties, is_in_atari, get_group_size, is_connected_to_edge,\n check_ladder, is_ko_active, check_capture_race.\n- Pattern tools: name_move, find_pattern.\n- KataGo: query_katago (max_visits 100–200).\n\nAfter completing your explanation and all claim tags, output nothing further.\nPass 2 will receive only your claim values.\n```\n\n**Pass 1 Expected Generation Trace**\n```text\n<tool_call>{\"name\": \"query_katago\", \"arguments\": {\"max_visits\": 150,\n \"include_ownership\": true, \"include_ownership_stdev\": true}}</tool_call>\n<tool_response>{\"rootInfo\": {\"winrate\": 0.63, \"scoreLead\": 4.5,\n \"scoreStdev\": 18.2, \"rawStWrError\": 0.07, \"rawVarTimeLeft\": 38.0,\n \"utility\": 0.74},\n \"moveInfos\": [\n {\"move\": \"S15\", \"utility\": 0.74, \"scoreLead\": 10.5, \"order\": 0,\n \"pv\": [\"S15\", \"T15\", \"T16\", \"S16\", \"R16\"], \"visits\": 120},\n {\"move\": \"Q15\", \"utility\": 0.61, \"scoreLead\": 6.2, \"order\": 1,\n \"pv\": [\"Q15\", \"R14\", \"S14\"], \"visits\": 20}],\n \"ownership\": [...], \"ownershipStdev\": [...]}</tool_response>\n\n<tool_call>{\"name\": \"is_in_atari\", \"arguments\": {\"anchor\": \"R15\"}}</tool_call>\n<tool_response>true</tool_response>\n\n<tool_call>{\"name\": \"count_liberties\", \"arguments\": {\"anchor\": \"R15\"}}</tool_call>\n<tool_response>1</tool_response>\n\n<tool_call>{\"name\": \"check_ladder\", \"arguments\":\n {\"prey_anchor\": \"R15\", \"chaser_anchor\": \"Q15\"}}</tool_call>\n<tool_response>{\"ladder_works\": true, \"favor\": \"black\", \"escapes\": false}</tool_response>\n\n<tool_call>{\"name\": \"name_move\", \"arguments\": {\"coord\": \"S15\"}}</tool_call>\n<tool_response>\"Hane\"</tool_response>\n\nThis is a middlegame position with action concentrated in the top-right corner.\n<claim id=\"game_phase\">middlegame</claim>\n\nThe KataGo evaluation shows high short-term uncertainty (rawStWrError 0.07), indicating\nthe position is tactically critical — a mistake here would be costly.\n<claim id=\"position_sharpness\">sharp</claim>\n\nThe overall game score is contested with a scoreStdev of 18.2, suggesting neither\nside has secured the game yet.\n<claim id=\"score_uncertainty\">medium</claim>\n\nThe top-right corner is the key area, with significant ownership uncertainty —\nboth sides can still influence the outcome there.\n<claim id=\"focus_region\">top-right</claim>\n<claim id=\"focus_urgency\">high</claim>\n<claim id=\"focus_ownership_balance\">-3.5</claim>\n<claim id=\"focus_ownership_uncertainty\">high</claim>\n\nWhite's group anchored at R15 has only one liberty and is in atari. It consists\nof 4 stones and has no connection to the board edge, making it vulnerable.\n<claim id=\"G1.color\">white</claim>\n<claim id=\"G1.anchor\">R15</claim>\n<claim id=\"G1.liberties\">1</claim>\n<claim id=\"G1.in_atari\">true</claim>\n<claim id=\"G1.size\">4</claim>\n<claim id=\"G1.connected_to_edge\">false</claim>\n<claim id=\"G1.is_key_group\">true</claim>\n\nThe ownershipStdev values over this group are high (mean 0.31), confirming the\ngroup's status is genuinely unsettled — its fate determines much of the position.\n<claim id=\"G1.ownership_confidence\">contested</claim>\n<claim id=\"G1.status\">unsettled</claim>\n\nThe ladder at S15 works for Black and White cannot escape by running out.\n<claim id=\"ladder_exists\">true</claim>\n<claim id=\"ladder_favor\">black</claim>\n\nThere is no ko fight currently active, and no capture race between major groups.\n<claim id=\"ko_active\">false</claim>\n<claim id=\"capture_race_exists\">false</claim>\n<claim id=\"capture_race_favor\">none</claim>\n\nThe primary candidate S15 is a Hane, a natural shape in this configuration.\n<claim id=\"focus_move_pattern\">Hane</claim>\n<claim id=\"focus_shape_type\">null</claim>\n\nKataGo rates S15 at utility 0.74, significantly ahead of the second option Q15\nat 0.61 — a large gap suggesting S15 is clearly best here.\n<claim id=\"best_move_utility\">0.74</claim>\n<claim id=\"utility_gap\">large</claim>\n<claim id=\"best_move_score_delta\">6.0</claim>\n\nBlack leads by approximately 4.5 points with a 63% win rate globally.\n<claim id=\"global_black_score_lead\">4.5</claim>\n<claim id=\"global_black_winrate\">63</claim>\n```\n\n**Pass 2 Input (Reconstructed JSON)**\n```json\n{\n \"game_phase\": \"middlegame\",\n \"position_sharpness\": \"sharp\",\n \"score_uncertainty\": \"medium\",\n \"focus_region\": \"top-right\",\n \"focus_urgency\": \"high\",\n \"focus_ownership_balance\": -3.5,\n \"focus_ownership_uncertainty\": \"high\",\n \"G1\": {\n \"color\": \"white\", \"anchor\": \"R15\", \"liberties\": 1,\n \"in_atari\": true, \"size\": 4, \"connected_to_edge\": false,\n \"is_key_group\": true, \"ownership_confidence\": \"contested\",\n \"status\": \"unsettled\"\n },\n \"ladder_exists\": true, \"ladder_favor\": \"black\",\n \"ko_active\": false,\n \"capture_race_exists\": false, \"capture_race_favor\": \"none\",\n \"focus_move_pattern\": \"Hane\", \"focus_shape_type\": null,\n \"best_move_utility\": 0.74, \"utility_gap\": \"large\",\n \"best_move_score_delta\": 6.0,\n \"global_black_score_lead\": 4.5, \"global_black_winrate\": 63\n}\n```\n\n**Pass 2 System Prompt**\n```text\nGiven the following Go position concept analysis, predict:\n1. Top-3 principal variations (PV lines), each as a move sequence of up to 8 moves.\n2. Ownership in the focus region: net balance (positive = Black) and uncertainty level.\n3. Global score lead (Black positive, in points).\n4. Global Black win rate (0–100%).\n```\n\n**Pass 2 Expected Output**\n```json\n{\n \"PV1\": [\"B S15\", \"W T15\", \"B T16\", \"W S16\", \"B R16\"],\n \"PV2\": [\"B Q15\", \"W R14\", \"B S14\", \"W T15\"],\n \"PV3\": [\"B R16\", \"W S15\", \"B T15\", \"W T16\"],\n \"focus_ownership\": {\"balance\": -3.5, \"uncertainty\": \"high\"},\n \"score_lead\": 4.5,\n \"winrate\": 63\n}\n```\n\n**Tool Definitions Group A: Go rules engine**\n```python\ndef count_liberties(anchor: str) -> int\ndef is_in_atari(anchor: str) -> bool\ndef get_group_size(anchor: str) -> int\ndef is_connected_to_edge(anchor: str) -> bool\ndef check_ladder(prey_anchor: str, chaser_anchor: str) -> dict\ndef is_ko_active() -> bool\ndef check_capture_race(group1_anchor: str, group2_anchor: str) -> dict\n```\n\n**Tool Definitions Group B: Sabaki boardmatcher**\n```python\ndef name_move(coord: str) -> str | None\ndef find_pattern(coord: str) -> str | None\n```\n\n**Tool Definitions Group C: KataGo live query**\n```python\ndef query_katago(\n max_visits: int = 200,\n include_ownership: bool = True,\n include_ownership_stdev: bool = True,\n include_policy: bool = False\n) -> dict\n```\n\n**KataGo Request Configuration**\n```json\n{\n \"maxVisits\": 800,\n \"includeOwnership\": true,\n \"includeOwnershipStdev\": true,\n \"includeMovesOwnership\": true,\n \"includeMovesOwnershipStdev\": false,\n \"includePolicy\": true,\n \"includePVVisits\": true,\n \"analysisPVLen\": 10\n}\n```\n\n**Reward Formulas**\n```text\nR_out = 0.30×R_pv + 0.20×R_own + 0.20×R_tier3 + 0.15×R_score + 0.15×R_wr\n\nΔ_j = R_out(original) − R_out(counterfactual_j)\nR_cf = Σ_j clip(Δ_j, 0, 0.5)\n\nR_total = R_fmt × (0.35×R_out + 0.35×R_proc + 0.30×R_cf)\n```\n\n### Data Tables\n\n**Self-Verification Claim Type Table**\n| Claim Type | Self-Verification Sufficient? | Why |\n| :--- | :--- | :--- |\n| Liberty/atari count | Yes, with board simulation | Mechanically decidable, no estimation error |\n| Ladder/net sequence | Yes, with step-by-step tracing | Deterministic tree search |\n| Group life-and-death status | Partially — easy cases yes | Hard cases require deep search; model errors correlate with generation errors |\n| Territory estimate | No | Requires understanding of influence and boundary disputes, highly model-error-prone |\n| \"This move is urgent\" / \"sente\" claims | No | Requires comparative win-rate evaluation across candidate moves |\n\n**Tier 1 — Board-engine Verifiable Fields**\n| Field | Type | Description |\n| :--- | :--- | :--- |\n| `Gk.color` | enum: `black`, `white` | Stone color |\n| `Gk.anchor` | string: coordinate | Identifier stone |\n| `Gk.liberties` | int 0–8 | Exact liberty count |\n| `Gk.in_atari` | bool | liberties == 1 |\n| `Gk.size` | int | Number of stones in group |\n| `Gk.connected_to_edge` | bool | Whether stone touches board edge |\n| `Gk.is_key_group` | bool | Assessment of centrality |\n| `ladder_exists` | bool | Ladder fight present |\n| `ladder_favor` | enum: `black`, `white`, `none` | Who wins the ladder |\n| `ko_active` | bool | Ko fight active |\n| `capture_race_exists` | bool | Semeai present |\n| `capture_race_favor` | enum: `black`, `white`, `even`, `none` | Who wins the capture race |\n\n**Tier 2 — boardmatcher-grounded Fields**\n| Field | Type | Description |\n| :--- | :--- | :--- |\n| `focus_move_pattern` | string or `null` | boardmatcher `nameMove()` |\n| `focus_shape_type` | string or `null` | boardmatcher `findPatternInMove()` |\n\n**Tier 3 — KataGo-oracle-grounded Fields**\n| Field | KataGo source | Description |\n| :--- | :--- | :--- |\n| `game_phase` | `rawVarTimeLeft` | opening, middlegame, endgame |\n| `position_sharpness` | `rawStWrError` | sharp, normal, quiet |\n| `score_uncertainty` | `scoreStdev` | high, medium, low |\n| `focus_region` | Model-selected | Region of relevance |\n| `focus_urgency` | composite | high, medium, low |\n| `focus_ownership_balance` | `ownership` sum | Net ownership, bucketed to 0.5 |\n| `focus_ownership_uncertainty` | `ownershipStdev` sum | high, medium, low |\n| `Gk.ownership_confidence` | `ownershipStdev` mean | settled, contested |\n| `Gk.status` | ownership + stdev | alive, dead, unsettled |\n| `best_move_utility` | `moveInfos[0].utility` | Bucketed to 0.01 |\n| `utility_gap` | utility diff | large, medium, small |\n| `best_move_score_delta` | scoreLead diff | Bucketed to 0.5 |\n| `global_black_score_lead` | `rootInfo.scoreLead` | Bucketed to 0.5 |\n| `global_black_winrate` | `rootInfo.winrate` | Bucketed to 1% |\n\n**Bucketing Rules**\n| Field | Rule |\n| :--- | :--- |\n| `position_sharpness` | rawStWrError: > 0.06 → sharp, 0.02–0.06 → normal, < 0.02 → quiet |\n| `score_uncertainty` | scoreStdev: > 20 → high, 10–20 → medium, < 10 → low |\n| `focus_urgency` | ownershipStdev region sum > 5 and rawStWrError > 0.03 → high; either → medium; neither → low |\n| `focus_ownership_uncertainty` | ownershipStdev region sum: > 5 → high, 2–5 → medium, < 2 → low |\n| `Gk.ownership_confidence` | ownershipStdev mean over group: > 0.2 → contested, ≤ 0.2 → settled |\n| `Gk.status` | ownership mean > 0.7 (correct color) and stdev < 0.2 → alive; mean < −0.3 → dead; otherwise unsettled |\n| `utility_gap` | diff > 0.1 → large, 0.03–0.1 → medium, < 0.03 → small |\n| `game_phase` | rawVarTimeLeft: > 60 → opening, 20–60 → middlegame, < 20 → endgame |\n\n**Process Self-Reward (R_proc) Scoring Table**\n| Scenario | Score |\n| :--- | :--- |\n| Tool called, claim matches tool response | +1.0 × weight |\n| Tool not called, claim correct per post-hoc engine check | +0.5 × weight |\n| Tool not called, claim incorrect | −0.5 × weight |\n| Tool called, claim contradicts tool response | −1.5 × weight |\n| boardmatcher called, claim matches | +1.0 × weight |\n| boardmatcher called, claim doesn't match | −0.5 × weight |\n| boardmatcher not called, claim matches post-hoc | +0.3 × weight |\n\n**Field Weights (Tier 1 + Tier 2)**\n| Field | Weight |\n| :--- | :--- |\n| `Gk.in_atari` | 0.16 |\n| `Gk.liberties` | 0.12 |\n| `ladder_exists` + `ladder_favor` | 0.16 |\n| `ko_active` | 0.08 |\n| `Gk.status` | 0.12 |\n| `capture_race_exists` + `capture_race_favor` | 0.12 |\n| `Gk.size` | 0.06 |\n| `Gk.connected_to_edge` | 0.06 |\n| `focus_move_pattern` | 0.08 |\n| `focus_shape_type` | 0.04 |\n\n**Tier 3 Bucket Accuracy Weights**\n| Field | Weight |\n| :--- | :--- |\n| `score_uncertainty` | 0.20 |\n| `position_sharpness` | 0.20 |\n| `focus_urgency` | 0.20 |\n| `focus_ownership_uncertainty` | 0.20 |\n| `game_phase` | 0.10 |\n| `utility_gap` | 0.10 |\n\n**Counterfactual Targets**\n| Field | Perturbation |\n| :--- | :--- |\n| `G1.in_atari` | flip bool; adjust liberties |\n| `ladder_exists` | flip bool; if false, set `ladder_favor = none` |\n| `focus_ownership_uncertainty` | `high` ↔ `low`; `medium` unchanged |\n| `position_sharpness` | `sharp` ↔ `quiet`; `normal` unchanged |\n| `utility_gap` | `large` ↔ `small`; `medium` unchanged |\n\n**Sequence Length Budgets**\n| Pass | v0.2 budget | v0.3 budget | Reason |\n| :--- | :--- | :--- | :--- |\n| Pass 1 max tokens | 768 | 1024 | Prose interleaved with tags adds ~250 tokens |\n| Pass 2 max tokens | 256 | 256 | Unchanged |\n\n**Key Hyperparameters**\n| Parameter | v0.3 value |\n| :--- | :--- |\n| Max Pass 1 tokens | 1024 |\n| Max Pass 2 tokens | 256 |\n| Group size G | 6 |\n| Learning rate | 5e-6 |\n| KL coefficient β | 0.04 |\n| Reference model | Qwen3-8B-Instruct frozen |\n| Training steps | 500–1000 |\n| Warmup steps | 50 |\n| Counterfactual fields per rollout | 5 |\n\n**Data Pipeline Source Fractions**\n| Source | Fraction |\n| :--- | :--- |\n| Professional game corpora | 40% |\n| Synthetic critical positions (KataGo-filtered) | 40% |\n| LoGos training positions with low explanation fidelity | 20% |\n\n**Estimated Training Timeline (10k positions, 1 epoch)**\n| Phase | Estimated time |\n| :--- | :--- |\n| KataGo pre-analysis (800 visits) | ~2 hours |\n| boardmatcher pre-computation | ~30 minutes |\n| SFT cold-start (1000 demos, 2 epochs) | ~2–3 hours |\n| RL training (500 steps, G=6, 5 cf, 1024 tokens) | ~20–28 hours |\n| Evaluation + human review sample | ~3 hours |\n| **Total** | **~28–36 hours** |\n\n### Lists and Enumerations\n\n**Base Model Selection**\n*   Recommended: Qwen3-8B (non-thinking mode)\n*   Fallback: Qwen2.5-7B-Instruct\n\n**Training Filtering Criteria**\n*   Top-3 KataGo moves differ by ≥ 5% utility\n*   `scoreStdev` > 15\n*   At least one of: atari present, ladder present, capture race, `ownershipStdev` sum over any quadrant > 3\n\n**Pilot Success Criteria**\n1.  `in_atari` accuracy > 95%, self-contradiction rate < 5%\n2.  `ladder_exists` accuracy > 85%\n3.  `focus_move_pattern` match rate > 70% on named positions\n4.  Top-1 PV first-move match > 55%\n5.  Mean causal delta for `in_atari` > 0.05\n6.  Mean causal delta for `position_sharpness` > 0.03\n7.  Tier 1 concept accuracy ≥ 5 percentage points above the two-pass no-tool baseline\n8.  Human evaluation: ≥ 60% of prose-claim pairs rated \"supports the claim\"\n\n**Known Risks**\n*   Vacuous prose generation\n*   Claim tag appearing before reasoning\n*   Reward hacking through explanation style\n*   Faithfulness plateau\n*   Domain knowledge bottleneck\n*   Unfaithful verbose CoT\n*   Reward hacking at the PV boundary\n*   PV divergence at high move count\n\n**Deferred Phase 2 Decisions**\n*   Prose-claim consistency reward\n*   Prose quality optimization (RLHF/preference learning)\n*   `joseki_deviation` claim field\n*   Human SL model integration via KataGo `humanPolicy`\n*   Open-ended discovered concept slots\n*   Full-strength life-and-death solver\n*   veRL migration for multi-node scaling","error":null,"saved_to":"/home/user/.pplx/search/fetch/a4e999d9.json"}
