{"url":"https://www.perplexity.ai/search/e75fe5d3-5fa5-484f-8a3d-20ad8a9ca0af","title":"Perplexity","description":"Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.","authors":[],"published_date":null,"domain":"perplexity.ai","is_paywall":false,"is_cached":false,"content":"### Filenames\n* `results_comparison.csv`\n* `README.md`\n* `claim_consistency_experiment.py`\n* `claim_consistency_coupling_experiment_executed.ipynb`\n* `results_comparison_hard.md`\n* `results_comparison_hard.csv`\n* `results_comparison_scaled.csv`\n* `results_hidden_state_intervention.csv`\n* `fever50k_full_20260428.csv`\n* `fever_pretrained_gpt2_experiment.py`\n* `fever50k_full_20260428.md`\n* `manifest.json`\n\n### Data Tables and Metrics\n\n#### Synthetic Coupling Test (Easy/Non-overlapping)\n| Metric | Consistency-Trained Variants | Baseline (No Consistency Loss) |\n| :--- | :--- | :--- |\n| Classifier Accuracy (Rationale Hidden States) | 100% | 3.9% |\n| Generation Accuracy (`full_sequence`) | 93.8% | 75% |\n| Generation Accuracy (`rationale_only`) | 63% | - |\n| Generation Accuracy (`earlier_token_only`) | 62% | - |\n| Counterfactual Swap-Following (Classifier) | 100% | 67% |\n| Original State Sticking (Classifier) | 0% | 3% |\n| Counterfactual Swap-Following (Gen - `full_seq`) | 93.8% | - |\n| Original State Sticking (Gen - `full_seq`) | 1.6% | - |\n| Shuffled-Pairing Control Accuracy | 8-10% | 8-10% |\n\n#### Synthetic Experiment Parameters\n| Parameter | Value |\n| :--- | :--- |\n| Training Samples | 512 |\n| Evaluation Samples | 128 |\n| Counterfactual Samples | 64 |\n| Latent States | 8 |\n| Training Epochs | 5 |\n\n#### Hard Synthetic Experiment (50% Overlapping Vocabulary)\n| Metric | Consistency-Trained Variants | Baseline (No Consistency Loss) |\n| :--- | :--- | :--- |\n| Classifier Accuracy | 100% | 4.7% |\n| Counterfactual Swap-Following (Classifier) | 100% | 6.3% |\n| Generation Accuracy (General) | 81-100% | 100% |\n| Generation Accuracy (`rationale_only`) | 100% | - |\n| Generation Accuracy (`earlier_token_only`) | 100% | - |\n| Generation Accuracy (`full_sequence`) | 81% | - |\n| Counterfactual Gen Swap-Following (`full_seq`) | 77% | - |\n\n#### FEVER Experiment (Pretrained GPT-2)\n| Variant | Classifier Accuracy | Counterfactual Swap-Following | Generation Accuracy |\n| :--- | :--- | :--- | :--- |\n| `full_sequence_pooling` | 84% | 28-48% | - |\n| `claim_only_pooling` | 83% | 28-48% | - |\n| `evidence_only_pooling` | 44% | 48% | - |\n| `no_consistency_loss` (Baseline) | 40% | - | 80% |\n\n#### Causal Intervention Results\n| Metric | Result |\n| :--- | :--- |\n| Hidden-State Intervention Success | 73-89% |\n| `claim_only` Control Coupling | 43% |\n\n### JSON Snippet\n```json\n{\"status\": \"safe\", \"confidence\": 0.92}\n```\n\n### Downloadable Artifacts\n* `results_comparison.csv`\n* `results_comparison_hard.csv`\n* `results_comparison_scaled.csv`\n* `results_hidden_state_intervention.csv`\n* `fever50k_full_20260428.csv`\n* `manifest.json`\n* `README.md`\n* `results_comparison_hard.md`\n* `fever50k_full_20260428.md`\n* `claim_consistency_experiment.py`\n* `fever_pretrained_gpt2_experiment.py`\n* `claim_consistency_coupling_experiment_executed.ipynb`","error":null,"saved_to":"/home/user/.pplx/search/fetch/a1732dc9.json"}
