### Files Mentioned
* `fever_pretrained_gpt2_experiment.py`
* `fever50k_full_20260428.md`
* `modal_fever_run.py`
* `run_fever_pretrained_gpu.py`

### Code Blocks

**Gradient blocking for non-evidence tokens (Python)**
```python
# inside forward(), after hidden_states = outputs.last_hidden_state
if pooling_mode == "evidence_only_pooling":
 # mask out claim + label (+ optionally everything after [SEP])
 masked_hs = hidden_states.clone()
 for i in range(B):
 # sep_pos[i] is index of [SEP]
 # zero out positions >= sep_pos[i] (claim, [LABELSEP], label, EOS, padding)
 start = sep_pos[i]
 masked_hs[i, start:, :] = 0.0
 pooled = self._pool(masked_hs, pooling_mode, sep_pos, labelsep_pos, attention_mask)
else:
 pooled = self._pool(hidden_states, pooling_mode, sep_pos, labelsep_pos, attention_mask)
cons_logits = self.consistency_head(pooled)
```

**Recommended Modal Run Command (Bash)**
```bash
python modal_fever_run.py \
 --model_name gpt2 \
 --train_samples 50000 \
 --eval_samples 5000 \
 --max_seq_len 256 \
 --epochs 5 \
 --batch_size 16 \
 --lr 5e-5 \
 --consistency_loss_weight 0.5 \
 --freeze_lower_layers_epochs 1 \
 --seed 42 \
 --output_stem fever50k_tightened_diag \
 --variants no_consistency_loss,evidence_only_pooling,evidence_only_strict,claim_only_pooling,evidence_only_random_labels
```

### Metrics and Data Points

**Evaluation Metrics**
| Metric Name | Description |
| :--- | :--- |
| `cls_claim_acc` | Classification accuracy of the consistency head. |
| `gen_claim_acc` | Accuracy of the language model generating the label token. |
| `cfact_cls_follows_swap` | Frequency the classifier follows the label of swapped evidence. |
| `cfact_cls_follows_orig` | Frequency the classifier follows the original label despite swapped evidence. |
| `matched_cfact_cls_follows_swap` | Counterfactual swap metric using lexically similar claims with different labels. |
| `matched_cfact_cls_follows_orig` | Frequency classifier follows original label in matched-claim swaps. |
| `Δcls_claim_acc` | Difference in accuracy vs. no-consistency baseline. |
| `Δcfact_cls_follows_swap` | Change in counterfactual swap behavior vs. baseline. |
| `Δcfact_cls_follows_orig` | Change in original label following behavior vs. baseline. |

**Labels**
* SUPPORTS
* REFUTES
* NEI (Not Enough Info)

**Experimental Parameters**
| Parameter | Value |
| :--- | :--- |
| Model Name | gpt2 |
| Training Samples | 50,000 |
| Evaluation Samples | 5,000 |
| Max Sequence Length | 256 |
| Epochs | 5 |
| Batch Size | 16 |
| Learning Rate | 5e-5 |
| Consistency Loss Weight | 0.5 |
| Freeze Lower Layers Epochs | 1 |
| Seed | 42 |
| Output Stem | fever50k_tightened_diag |

### Experimental Variants
* `no_consistency_loss`
* `evidence_only_pooling`
* `evidence_only_strict` (zeroes non-evidence hidden states)
* `claim_only_pooling`
* `evidence_only_random_labels` (consistency head trained on permuted labels)
* `last_four_tokens_pooling` (fixed tail span baseline)

### Project Components and Steps

**General Mechanism Pattern**
1. LM emits prose plus inline claims.
2. Claims checked by strong verifier/oracle.
3. Prose quality improves via shared hidden states and consistency loss.
4. Fixed claim ontology (optional).

**Recommended Sequence**
1. FEVER (NLP validation)
2. Another fact/NLI benchmark
3. Structured-domain oracle (KataGo or code execution)
4. Ablations on claim format and evidence noise

**Best Next Steps**
1. Run an out-of-domain verifier task.
2. Test a different claim format (varying expression/length).
3. Stress the verifier boundary (adversarial swaps, noisy retrieval).
4. Measure transfer (coherence, factual consistency, calibration).
5. Scale to a second real domain (medical QA, code, math).

### Data Tables and Snippets
* **CSV/JSON/JSONL Snippets:** Not available in page.
* **Downloadable Artifacts:** Not available in page.
* **Metric Values:** No raw result values provided (only target parameters for future runs).