{"url":"https://www.perplexity.ai/search/7eae3017-9f02-4a34-bd50-742df95924e6","title":"Perplexity","description":"Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.","authors":[],"published_date":null,"domain":"perplexity.ai","is_paywall":false,"is_cached":false,"content":"### Code Blocks\n\n**Oracle Implementation Snippet**\n```python\ndef oracle_verify(func_code: str, func_name: str) -> dict:\n \"\"\"\n Returns all 12 claim types by:\n - Parsing AST for loop structure, mutations, operations\n - Running test cases for correctness, edge cases\n - Computing complexity from loop nesting depth + recursion\n - Pattern matching for algorithm class (keywords, structure)\n \"\"\"\n return {\n 'time_complexity': 'O_n2',\n 'space_complexity': 'O_1',\n 'best_case_time': 'O_n2',\n 'algorithm_class': 'sorting',\n 'loop_structure': 'nested_2_level',\n 'key_operation': 'comparison',\n 'access_pattern': 'sequential',\n 'auxiliary_structures': 'temp_variable',\n 'mutates_input': True,\n 'correctness_status': 'fully_correct',\n 'handles_empty_input': True,\n 'handles_duplicates': 'preserves_all',\n }\n```\n\n**Dataset Structure (dataset_rich.py)**\n```python\n@dataclass\nclass RichExample:\n idx: int\n code_snippet: str\n true_explanation: str # 3-5 sentences\n mismatched_explanation: str\n # 12 claim types as separate fields\n time_complexity: str\n space_complexity: str\n best_case_time: str\n algorithm_class: str\n loop_structure: str\n key_operation: str\n access_pattern: str\n auxiliary_structures: str\n mutates_input: bool\n correctness_status: str\n handles_empty_input: bool\n handles_duplicates: str\n template_name: str\n\ndef build_rich_dataset(n=2500, seed=42) -> List[RichExample]:\n \"\"\"Generate n examples with programmatic oracle.\"\"\"\n ...\n```\n\n**Tokenizer Extensions**\n```python\nCLAIM_TOKENS = [\n 'time_complexity=O_1', 'time_complexity=O_n', 'time_complexity=O_n2', ...,\n 'algorithm_class=sorting', 'algorithm_class=searching', ...,\n 'loop_structure=single_pass', 'loop_structure=nested_2_level', ...,\n # ... all claim value combinations\n]\n```\n\n**Target Sequence Format**\n```text\n<bos> {code} <sep> {explanation}\n<claim>time_complexity=O_n2</claim>\n<claim>space_complexity=O_1</claim>\n<claim>best_case_time=O_n2</claim>\n<claim>algorithm_class=sorting</claim>\n<claim>loop_structure=nested_2_level</claim>\n<claim>key_operation=comparison</claim>\n<claim>access_pattern=sequential</claim>\n<claim>auxiliary_structures=temp_variable</claim>\n<claim>mutates_input=true</claim>\n<claim>correctness_status=fully_correct</claim>\n<claim>handles_empty_input=true</claim>\n<claim>handles_duplicates=preserves_all</claim>\n<eos>\n```\n\n**Model Architecture Changes (model.py)**\n```python\n# In model.py\nself.time_head = nn.Linear(cfg.d_model, 6) # 6 time complexity bins\nself.space_head = nn.Linear(cfg.d_model, 4) # 4 space bins\nself.bestcase_head = nn.Linear(cfg.d_model, 6)\nself.algclass_head = nn.Linear(cfg.d_model, 7) # 7 algorithm classes\nself.loopstruct_head = nn.Linear(cfg.d_model, 6)\nself.keyop_head = nn.Linear(cfg.d_model, 7)\nself.accesspat_head = nn.Linear(cfg.d_model, 5)\nself.auxstruct_head = nn.Linear(cfg.d_model, 7)\nself.mutates_head = nn.Linear(cfg.d_model, 2)\nself.correctness_head = nn.Linear(cfg.d_model, 5)\nself.emptyinput_head = nn.Linear(cfg.d_model, 3)\nself.duplicates_head = nn.Linear(cfg.d_model, 4)\n\n# Consistency loss averages over all 12 heads\n```\n\n**Execution Commands (run_rich_experiment.py)**\n```bash\npython run_rich_experiment.py --full # 20 epochs, 2500 examples\npython run_rich_experiment.py --smoke # 3 epochs, 300 examples for testing\n```\n\n**Bubble Sort Explanation Ground-Truth Example**\n```text\n\"This function implements bubble sort by repeatedly comparing adjacent elements and swapping them if out of order. It uses nested loops to make multiple passes over the list, resulting in O(n²) time complexity in all cases. The algorithm modifies the input list in-place using only constant auxiliary space for the swap operation. It correctly handles empty lists and preserves duplicate values while maintaining stability.\"\n```\n\n**Garbled Output Example (Main Variant)**\n```text\n\"tatit p er nation ch esap er sarsandcondites in gparsiteandsitev in g in gleandicantion in s.\nTime O n space O1. Best-case O1 when stelement. Keyoperation is ss...\"\n```\n\n**Adversarial Minimal Pairs Snippet**\n```python\n# Template A: Correct binary search\ndef search(arr, target):\n lo, hi = 0, len(arr)\n while lo < hi:\n mid = (lo + hi) // 2\n if arr[mid] < target:\n lo = mid + 1\n else:\n hi = mid\n return lo\n\n# Template B: Off-by-one bug (< vs <=)\ndef search(arr, target):\n lo, hi = 0, len(arr)\n while lo <= hi: # BUG\n mid = (lo + hi) // 2\n if arr[mid] < target:\n lo = mid + 1\n else:\n hi = mid\n return lo\n```\n\n**Pretrained Model Variants List**\n```python\nVariants to run:\n1. pretrained_frozen # Zero-shot, no training\n2. pretrained_lm_only # Fine-tune on mismatched pairs (corruption test)\n3. pretrained_consistency # LM + consistency loss (main hypothesis)\n4. pretrained_claim_only # Consistency on claims, not explanations\n5. pretrained_expl_only_verifier # Claims verified from text only\n6. pretrained_surface_bottleneck # Consistency from token logits\n```\n\n### Filenames Mentioned\n*   `dataset_rich.py`\n*   `model_rich.py`\n*   `trainer_rich.py`\n*   `run_rich_experiment.py`\n*   `manual_review_scores_rich_full.csv`\n*   `manual_review_annotations_rich_full.csv`\n*   `report.pplx.md`\n*   `qualitative_side_by_side_rich.pplx.md`\n*   `manual_qualitative_review_rich_full.pplx.md`\n*   `dataset.py`\n*   `model.py`\n\n### Data Tables and Metric Summaries\n\n**Rich Results Summary Table**\n| Variant | Mean_Coupling | Swap_Influence | BLEU_1 | ROUGE_L | Manual_Usable | Manual_Correct |\n| :--- | :--- | :--- | :--- | :--- | :--- | :--- |\n| Consistency Loss (Main) | 0.986 | 0.977777777777778 | 0.0610720361509835 | 0.0611787422934043 | 0 | 0 |\n| No Consistency Loss | 0.2863333333333333 | -0.0333333333333333 | 0.0686355661881977 | 0.0617427855224923 | 0 | 0 |\n| Claim-Only Pooling | 0.5946666666666666 | 0.1222222222222222 | 0.0554189261031366 | 0.0478925556264365 | 0 | 0 |\n| Random Label Consistency | 0.562 | -0.0222222222222222 | 0.0593285486443381 | 0.0526238305112604 | 0 | 0 |\n| No Claim↔Claim Attn (V2) | 0.9858333333333336 | 0.977777777777778 | 0.0614013822434875 | 0.0594859466458738 | 0 | 0 |\n| Claims from Expl Only (V2) | 0.986 | 0.977777777777778 | 0.0557886762360446 | 0.0569592942588268 | 0 | 0 |\n| Surface Bottleneck (V2) | 0.5501666666666667 | -0.0333333333333333 | 0.0687671451355661 | 0.0646380236177304 | 0 | 0 |\n| Surface Btlnk No Expl LM (V2) | 0.547 | -0.0222222222222222 | 0.0 | 0.0 | 0 | 0 |\n\n**Structural Parallelism: Go vs. Code (Complexity)**\n| Dimension | Go (KataGo) | Code (Complexity) |\n| :--- | :--- | :--- |\n| Generated text | Move commentary | Algorithm explanation |\n| Verifiable claim | Win probability bin | Complexity class |\n| Oracle | KataGo analysis | Static analyzer |\n| Claim format | Scalar/categorical | Categorical (O-notation) |\n\n**Structural Parallel: Go vs. Code (Explanations)**\n| Aspect | Go Domain | Code Domain |\n| :--- | :--- | :--- |\n| Input | Board position | Code snippet |\n| Generated output | Natural language commentary | Natural language explanation |\n| Interleaved claims | Win-prob, score-lead, urgency | Complexity, correctness, behavior |\n| Oracle | KataGo analysis | Static analyzer / test suite |\n\n**Comparison Questions for Pretrained Models**\n| Question | Comparison |\n| :--- | :--- |\n| Does fine-tuning corrupt explanations? | `frozen` vs `lm_only` |\n| Does consistency prevent corruption? | `lm_only` vs `consistency` |\n| Is the mechanism representation or text? | `consistency` vs `expl_only_verifier` |\n| Do claims need to be verbalized? | `consistency` vs `expl_only_verifier` |\n\n### Statistics and Specific Metrics\n\n**FEVER Proxy Results**\n*   Accuracy: 100%\n*   Faithfulness: 100%\n*   Swap influence: 0%\n\n**Rich-Claim Experiment Summary Metrics**\n*   Mean coupling (Main Variant): 98.6%\n*   Swap influence (Main Variant): 97.8%\n*   Mean coupling (No Consistency Baseline): 28.6%\n*   Random labels coupling: 56.2%\n*   Claim-only pooling coupling: 59.5%\n*   Surface bottleneck coupling: 55%\n*   Manual review usable explanations: 0/20\n*   BLEU-1 (Main Variant): ~0.061\n*   Claim emission accuracy: 2.2%\n\n**Target Validation Metrics (Target Goals)**\n*   Classifier Accuracy: >90% (KataGo parallel)\n*   Swap-following: >95%\n*   Mean Coupling (Rich Experiment): >85%\n*   BLEU-1 / ROUGE-L: 0.30-0.50\n\n**Data and Scale Requirements**\n*   Rich dataset training: 2,500 examples\n*   Rich dataset validation: 500 examples\n*   Algorithm complexity dataset: 2,000-5,000 algorithm implementations\n*   Initial coupling signal requirement: 2,000-3,000 examples\n*   Publication-quality requirement: 5,000-10,000 examples\n*   Training duration: 10-20 epochs (1-2 days on single GPU)\n*   Model parameter range: 1-3B or 7B (CodeLlama 7B, StarCoder 1B, CodeGen-small 350M)\n\n### Lists of Specifications\n\n**Rich Claim Ontology (12 Types)**\n1.  **Complexity Claims**:\n    *   `time_complexity`: {O_1, O_log_n, O_n, O_n_log_n, O_n2, O_2n}\n    *   `space_complexity`: {O_1, O_log_n, O_n, O_n2}\n    *   `best_case_time`: {O_1, O_log_n, O_n, O_n_log_n, O_n2, same_as_worst}\n2.  **Algorithmic Structure Claims**:\n    *   `algorithm_class`: {sorting, searching, traversal, dynamic_programming, greedy, math_computation, string_processing}\n    *   `loop_structure`: {no_loops, single_pass, nested_2_level, nested_3_level, recursive, iterative_with_recursion}\n    *   `key_operation`: {comparison, arithmetic, hash_lookup, list_append, string_concat, swap, assignment}\n    *   `access_pattern`: {sequential, random_access, sliding_window, two_pointer, divide_conquer}\n3.  **Data Structure Claims**:\n    *   `auxiliary_structures`: {none, temp_variable, array, hash_map, set, stack, recursion_stack}\n    *   `mutates_input`: {true, false}\n4.  **Correctness Claims**:\n    *   `correctness_status`: {fully_correct, off_by_one, wrong_condition, missing_edge_case, infinite_loop_risk}\n    *   `handles_empty_input`: {true, false, crashes}\n    *   `handles_duplicates`: {preserves_all, removes_duplicates, undefined_behavior, not_applicable}\n\n**Code Generation Specs for Dataset**\n*   Length: 5-20 lines\n*   Bug rate: ~20%\n*   Edge cases: empty input, single element, duplicates, negative numbers, None values.\n\n### Downloadable Artifacts\n*Specific downloadable links are not present on the page; however, the following files are cited as part of the experiment output structure:*\n*   `manual_review_scores_rich_full.csv`\n*   `manual_review_annotations_rich_full.csv`\n*   `rich_results_summary` (implied CSV/Table)","error":null,"saved_to":"/home/user/.pplx/search/fetch/f12c98c9.json"}
