{"url":"https://www.perplexity.ai/search/6db52913-3dcd-4dac-9929-4112a9c1e28a","title":"Perplexity","description":"Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.","authors":[],"published_date":null,"domain":"perplexity.ai","is_paywall":false,"is_cached":false,"content":"### Mathematical Formulas and Data Points\n\n| Variable/Parameter | Definition/Formula |\n| :--- | :--- |\n| Completion Group | $o^{(1)}, o^{(2)}, \\dots, o^{(G)}$ |\n| Scalar Reward | $r^{(i)} = r(\\text{board}, o^{(i)})$ |\n| Mean Reward | $\\bar{r} = \\frac{1}{G}\\sum_i r^{(i)}$ |\n| Standard Deviation of Rewards | $\\sigma_r = \\text{std}(r^{(1)}, \\dots, r^{(G)})$ |\n| Group-Normalized Advantage | $A^{(i)} = \\frac{r^{(i)} - \\bar{r}}{\\sigma_r + \\epsilon}$ |\n\n### Sequence and Architecture Structures\n\n| Type | Structure/Sequence |\n| :--- | :--- |\n| Full Sequence | `board description + explanation tokens + [MOVE]` |\n| Causal Chain | `... board … explanation …` |\n| Move Context | `[prompt (board)] + [explanation]` |\n| System Design Flow | `[BOARD AS TEXT] → [EXPLANATION TOKENS] → [MOVE TOKEN]` |\n| Safe Bottleneck | `board text → transformer → explanation tokens → transformer → move (LM head)` |\n| Broken Bottleneck Path 1 | `board → board encoder → board embedding → move head` |\n| Broken Bottleneck Path 2 | `board text → transformer → explanation tokens` |\n\n### Code and Token Snippets\n\n*   `board description + explanation tokens + [MOVE]`\n*   `... board … explanation …`\n*   `[prompt (board)] + [explanation]`\n*   `[BOARD AS TEXT] → [EXPLANATION TOKENS] → [MOVE TOKEN]`\n*   `move_head(board_embedding)`\n*   `move_logits = lm.forward(board_tokens + explanation_tokens)`\n*   `board text → transformer → explanation tokens → transformer → move (LM head)`\n*   `board → board encoder → board embedding → move head`\n*   `board text → transformer → explanation tokens`\n*   `<MOVE>` (Special marker)\n*   `\"Q16\"` (Example move token)\n\n### Specific Terms, Names, and Entities\n\n*   **Models/Frameworks:**\n    *   LoGos\n    *   KataGo\n    *   GRPO (Group Relative Policy Optimization)\n    *   Transformer\n    *   LM head\n*   **Concepts:**\n    *   Prefix-locked explanation\n    *   Explanation bottleneck\n    *   Unfaithful CoT (Chain of Thought)\n    *   Causal mask\n    *   Residual stream\n    *   Group-relative advantages\n    *   Implicit process reward\n    *   SGF-like string\n*   **External References/Sources:**\n    *   jalammar.github+2\n    *   arxiv+1\n    *   arxiv+2\n    *   emergentmind+1\n    *   emergentmind\n    *   adaline+2\n    *   rlhfbook+1\n    *   huggingface+2\n    *   abderrahmanskiredj.github+2\n\n### Specific Information States\n*   **Filenames:** Not available.\n*   **Data Tables:** Not available (structured data extracted into tables above).\n*   **CSV/JSON/JSONL Snippets:** Not available.\n*   **Downloadable Artifacts:** Not available.\n*   **Metric Tables:** Not available.","error":null,"saved_to":"/home/user/.pplx/search/fetch/3f24ba38.json"}
