# FEVER Pretrained GPT-2 Claim-Consistency Coupling Results

## Experiment Setup

- Model: `gpt2` (pretrained HuggingFace GPT-2 backbone)
- Dataset: `copenlu/fever_gold_evidence` (copenlu/fever_gold_evidence)
- Train samples: 50,000 | Eval samples: 5,000
- max_seq_len: 256
- Epochs: 5 | Batch size: 16 | LR: 5e-05
- Consistency loss weight: 0.5
- Freeze lower layers epochs: 1

## Sequence Format

```
[BOS] <evidence_passage> [SEP] <claim> [LABELSEP] <label_token> [EOS]
```

- **Evidence pooling**: mean of hidden states at positions 1..sep_pos-1 (before [SEP])
- **Evidence strict**: same pooling plus zeroed non-evidence hidden states for the consistency path
- **Claim pooling**: mean over positions sep_pos+1..labelsep_pos-1
- **Full pooling**: mean over all non-pad tokens
- **Random-label control**: evidence-only strict path trained against permuted consistency labels

## Results Table

| variant | final_lm_loss | final_cons_loss | gen_claim_acc | cls_claim_acc | cfact_cls_follows_swap | cfact_cls_follows_orig | matched_cfact_cls_follows_swap | matched_cfact_cls_follows_orig | cfact_gen_follows_swap | cfact_gen_follows_orig | shuffled_cls_acc | shuffled_gen_acc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| no_consistency_loss | 0.2471 | 0.9054 | 0.8040 | 0.5278 | 0.3500 | 0.3760 | 0.3160 | 0.3660 | 0.2900 | 0.4500 | 0.4420 | 0.4880 |
| evidence_only_pooling | 0.2174 | 0.7850 | 0.8040 | 0.4412 | 0.4620 | 0.2660 | 0.4580 | 0.2340 | 0.3000 | 0.4560 | 0.3120 | 0.4960 |
| evidence_only_strict | 0.2138 | 0.7856 | 0.8140 | 0.4384 | 0.4660 | 0.2620 | 0.4640 | 0.2300 | 0.3000 | 0.4440 | 0.3060 | 0.4760 |
| full_sequence_pooling | 0.2098 | 0.2086 | 0.8220 | 0.8358 | 0.2900 | 0.4400 | 0.3440 | 0.4460 | 0.2800 | 0.4500 | 0.4560 | 0.4680 |
| claim_only_pooling | 0.1951 | 0.1914 | 0.8380 | 0.8372 | 0.2960 | 0.4520 | 0.3380 | 0.4600 | 0.2980 | 0.4540 | 0.4720 | 0.4820 |
| evidence_only_random_labels | 0.2186 | 1.0336 | 0.8060 | 0.2312 | 0.1940 | 0.4420 | 0.2180 | 0.4200 | 0.2920 | 0.4540 | 0.3640 | 0.4900 |

## Metric Descriptions

- `matched_cfact_cls_follows_swap/orig`: counterfactual classification metrics using swapped evidence from lexically similar claims with different labels
- `evidence_only_strict`: consistency head can only use evidence-region activations after non-evidence hidden states are zeroed in the consistency path
- `evidence_only_random_labels`: sanity control where the consistency head is trained on permuted labels while LM loss remains intact