# FEVER Pretrained GPT-2 Claim-Consistency Coupling Results

## Experiment Setup

- Model: `gpt2` (pretrained HuggingFace GPT-2 backbone)
- Dataset: `copenlu/fever_gold_evidence` (copenlu/fever_gold_evidence)
- Train samples: 50,000 | Eval samples: 5,000
- max_seq_len: 256
- Epochs: 5 | Batch size: 16 | LR: 5e-05
- Consistency loss weight: 0.5
- Freeze lower layers epochs: 1

## Sequence Format

```
[BOS] <evidence_passage> [SEP] <claim> [LABELSEP] <label_token> [EOS]
```

- **Evidence pooling**: mean of hidden states at positions 1..sep_pos-1 (before [SEP])
- **Claim pooling**: mean over positions sep_pos+1..labelsep_pos-1
- **Full pooling**: mean over all non-pad tokens
- **LM loss**: cross-entropy at [LABELSEP] position (predicting label token)
- **Consistency loss weight**: 0.5 (except no_consistency_loss variant)

## Results Table

| variant | final_lm_loss | final_cons_loss | gen_claim_acc | cls_claim_acc | cfact_cls_follows_swap | cfact_cls_follows_orig | cfact_gen_follows_swap | cfact_gen_follows_orig | shuffled_cls_acc | shuffled_gen_acc |
|---|---|---|---|---|---|---|---|---|---|---|
| no_consistency_loss | 0.2541 | 4.2136 | 0.8040 | 0.3980 | 0.3660 | 0.3000 | 0.2820 | 0.4540 | 0.3040 | 0.4860 |
| evidence_only_pooling | 0.2153 | 0.7870 | 0.8160 | 0.4410 | 0.4800 | 0.2520 | 0.3000 | 0.4520 | 0.3100 | 0.4860 |
| full_sequence_pooling | 0.2080 | 0.2076 | 0.8300 | 0.8404 | 0.3020 | 0.4420 | 0.2820 | 0.4460 | 0.4660 | 0.4780 |
| claim_only_pooling | 0.1963 | 0.1929 | 0.8260 | 0.8334 | 0.2860 | 0.4660 | 0.2940 | 0.4700 | 0.4740 | 0.4860 |

## Metric Descriptions

- `cls_claim_acc`: 3-way classification accuracy (SUPPORTS/REFUTES/NEI) on eval set
- `gen_claim_acc`: generation accuracy (greedy decode of label token after [LABELSEP] prompt)
- `cfact_cls_follows_swap`: fraction where classifier follows *swapped evidence* label
- `cfact_cls_follows_orig`: fraction where classifier still follows original claim label despite swap
- `cfact_gen_follows_swap/orig`: same as above measured via generation
- `shuffled_cls_acc/gen_acc`: accuracy when evidence-claim pairs are randomly mismatched

## Variant Descriptions

| variant | description |
|---|---|
| `no_consistency_loss` | LM loss only; consistency head present but no gradient from it |
| `evidence_only_pooling` | Pool mean of hidden states over evidence tokens (before [SEP]) |
| `full_sequence_pooling` | Pool mean of hidden states over all non-padding tokens |
| `claim_only_pooling` | Pool mean over claim tokens ([SEP]..[LABELSEP]) — negative control |