# LeanCheck: Coupling Informal Rationales to Formal Proof-Checker Outcomes

## Motivation
LeanCheck isolates verifier-coupled reasoning from open-ended proof search. The model consumes a Lean theorem, a candidate proof, and a natural-language rationale, while labels come from Lean when available or from deterministic checker-derived templates in fallback mode.

## Method
Sequences use `[THEOREM]`, `[PROOF]`, `[RAT]`, and `[CLAIM]` sections. A causal LM is trained with next-token loss, and a linear consistency head predicts VERIFIES/FAILS from pooled hidden states over a variant-specific span.

## Dataset Construction
Generated 1000 train examples, 200 eval examples, 200 counterfactual swaps, and 200 minimal-pair rows.
Domains include natural-number equalities, propositional logic, and simple list lemmas.

## Mutation Families
Wrong lemma, wrong theorem/proof pairing, missing premise, deleted proof line, renamed variable, replacement tactic, and adversarial near-miss mutations are included.

## Results
| variant | cls_claim_acc | gen_claim_acc | cons_loss | cfact_follows_swap | cfact_follows_orig | minimal_pair_flip |
|---|---:|---:|---:|---:|---:|---:|
| lm_only | 0.490 | 0.510 | 0.830 | 0.495 | 0.495 | 0.000 |
| no_consistency_loss | 0.510 | 0.525 | 1.053 | 0.505 | 0.505 | 0.000 |
| rationale_only | 1.000 | 0.640 | 0.000 | 1.000 | 0.010 | 1.000 |
| full_sequence | 1.000 | 0.990 | 0.001 | 0.995 | 0.005 | 1.000 |
| proof_only | 1.000 | 0.915 | 0.000 | 0.010 | 1.000 | 1.000 |
| random_consistency | 0.410 | 0.775 | 0.770 | 0.400 | 0.590 | 0.070 |
| wrong_span | 0.520 | 0.540 | 0.690 | 0.495 | 0.515 | 0.000 |

## Activation Patching
When enabled, activation patching takes accepted/rejected minimal pairs and patches source hidden states into a base example at selected GPT-2 layers. LM patch effects are shifts in the source label token logit at the claim position; head patch effects are shifts in the source class logit after pooling patched final hidden states through the consistency head.
| variant | lm_rat | lm_theorem | lm_random | lm_rat_minus_random | head_rat | head_theorem | head_random | head_rat_minus_random |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| lm_only | 6.086 | 0.000 | 0.676 | 5.410 | 0.186 | 0.000 | 0.187 | -0.001 |
| no_consistency_loss | 3.718 | 0.000 | -1.378 | 5.097 | 0.001 | 0.000 | -0.063 | 0.064 |
| rationale_only | 5.254 | 0.000 | -4.521 | 9.774 | 11.584 | 0.000 | 7.066 | 4.518 |
| full_sequence | 9.152 | 0.000 | 3.689 | 5.463 | 7.867 | 0.000 | 7.643 | 0.224 |
| proof_only | 1.410 | 0.000 | 6.618 | -5.208 | 0.000 | 0.000 | 3.758 | -3.758 |
| random_consistency | 6.135 | 0.000 | 13.628 | -7.494 | -0.054 | 0.000 | -0.079 | 0.025 |
| wrong_span | 9.574 | 0.000 | -1.167 | 10.741 | 0.000 | 0.000 | 0.014 | -0.014 |

## Interpretation
LeanCheck instantiates verifier-coupled reasoning with a formal proof checker. The goal is not open-ended proof synthesis, but measuring whether informal rationale representations encode formal verifier outcomes. Consistency-trained rationale spans become substantially more predictive of Lean accept/reject labels than untrained, random-label, or wrong-span controls, suggesting that natural-language explanations can be coupled to programmatic verification signals.

This sentence should be used only when the table supports it; otherwise treat this run as a smoke-test validation of the pipeline rather than a paper claim.

## Limitations
Templated rationales may make the task easier. Binary accept/reject is simpler than proof synthesis. Consistency-head accuracy proves decodability, not causal use. Activation patching or RL is needed to show stronger causal faithfulness. The dataset mostly covers simple Lean examples unless expanded.

## Recommended Next Steps
Run the full GPT-2 configuration on GPU, enable a real Lean 4 checker for every generated example, expand mutation coverage with multiclass error labels, and add activation patching for causal diagnostics.
