Variants to run:
1. pretrained_frozen # Zero-shot, no training
2. pretrained_lm_only # Fine-tune on mismatched pairs (corruption test)
3. pretrained_consistency # LM + consistency loss (main hypothesis)
4. pretrained_claim_only # Consistency on claims, not explanations
5. pretrained_expl_only_verifier # Claims verified from text only
6. pretrained_surface_bottleneck # Consistency from token logits
