# Supplementary Material: Verifier-Gated Lean Formalization

This bundle accompanies the anonymous AI4Math submission. It contains the
review artifact, a reproducible workflow ledger, and scripts that regenerate the
reported counts and the showcase axiom audit.

## What To Verify First

If you only have 10 minutes:

1. Build the artifact with `lake build FormalSLT`.
2. Run the axiom audit on `examples/CheckShowcaseTheorems.lean`.
3. Open `formalslt/THEOREM_MANIFEST.md` to inspect the audited declarations.
4. Open `workflow_ledger.csv` and `LEDGER_SUMMARY.md` to inspect PR throughput.
5. Regenerate snapshot counts and compare them with Table 1 in the paper.

Expected result:

- `lake build FormalSLT` succeeds.
- The axiom audit reports only `propext`, `Classical.choice`, and `Quot.sound`.
- The snapshot and ledger scripts reproduce the counts in Table 1.

## Contents

```text
ai4math_supplementary/
├── README.md
├── workflow_ledger.csv
├── ci.yml
├── LEDGER_SUMMARY.md
├── SUPPLEMENTARY_EVIDENCE_APPENDIX.md
├── SUPPLEMENTARY_EVIDENCE_APPENDIX.pdf
├── SOURCE_STRUCTURE_ANALYSIS.json
├── MATHLIB_LEVERAGE_NOTE.md
├── LOCAL_REFERENCE_GRAPH_ANALYSIS.md
├── LOCAL_REFERENCE_GRAPH_ANALYSIS.json
├── LOCAL_REFERENCE_CHAIN_MATRIX.csv
├── SHOWCASE_AXIOMS.txt
├── SNAPSHOT_INVENTORY.md
├── REVIEWER_QUICKSTART.md
├── PROMPT_TEMPLATE.md
├── FAILURE_MODES.md
├── figures/
├── ANONYMIZATION_CHECKLIST.md
├── SUBMISSION_READINESS_CHECKLIST.md
├── formalslt_anonymized.tar.gz
└── scripts/
    ├── summarize_ledger.py
    ├── audit_formalslt_snapshot.py
    ├── analyze_source_structure.py
    ├── analyze_local_reference_graph.py
    ├── render_evidence_appendix_pdf.py
    └── axiom_audit.py
```

The source tarball expands to `formalslt/` and should include:

```text
formalslt/
├── FormalSLT.lean
├── FormalSLT/
├── examples/
├── lakefile.lean
├── lake-manifest.json
├── lean-toolchain
├── AGENTS.md
├── SNAPSHOT_INVENTORY.md
├── THEOREM_MANIFEST.md
├── SHOWCASE_AXIOMS.txt
├── .github/workflows/ci.yml
└── scripts/
```

## Reviewer Quickstart

For the most explicit command-by-command version, see
`REVIEWER_QUICKSTART.md`.

```bash
export PATH="$HOME/.elan/bin:$PATH"
tar -xzf formalslt_anonymized.tar.gz
cd formalslt
lake exe cache get
lake build FormalSLT
python scripts/axiom_audit.py \
  --manifest examples/CheckShowcaseTheorems.lean \
  --allow propext Classical.choice Quot.sound
python scripts/audit_formalslt_snapshot.py .
cd ..
python scripts/analyze_source_structure.py \
  --tarball formalslt_anonymized.tar.gz \
  --json-out SOURCE_STRUCTURE_ANALYSIS.json
python scripts/analyze_local_reference_graph.py \
  --tarball formalslt_anonymized.tar.gz \
  --json-out LOCAL_REFERENCE_GRAPH_ANALYSIS.json \
  --markdown-out LOCAL_REFERENCE_GRAPH_ANALYSIS.md \
  --matrix-csv LOCAL_REFERENCE_CHAIN_MATRIX.csv \
  --fig-dir figures
python scripts/summarize_ledger.py workflow_ledger.csv
```

The build checks Lean proof correctness. The axiom audit fails unless every
showcase theorem trace uses only `propext`, `Classical.choice`, and
`Quot.sound`.

This proposed supplement package was locally checked with `lake exe cache get`,
`lake build FormalSLT`, the no-`sorry`/no-custom-axiom regexes, and the live
axiom audit over 107 showcase traces.

## Count Source of Truth

Do not maintain manuscript counts by hand. Regenerate them with:

```bash
python scripts/audit_formalslt_snapshot.py formalslt
python scripts/summarize_ledger.py workflow_ledger.csv
```

For the tarball currently audited in this lane, the inventory is:

- 45 library modules under `FormalSLT/`.
- 46 source files including `FormalSLT.lean`.
- 20,080 source lines including `FormalSLT.lean`.
- 412 comment-skipped line-leading theorem/lemma declarations.
- 107 showcase `#print axioms` traces.
- 67 opened PRs, 60 merged, 7 closed without merge.
- 257 of 412 theorem/lemma declaration spans textually reference another local
  FormalSLT theorem/lemma name after excluding short/common names. This is a
  static source-structure proxy, not a theorem-difficulty or novelty metric.
- The conservative static declaration local-reference graph has 463 textual
  local-reference edges, 257 declarations with at least one textual
  local-reference edge, 295 declarations textually referenced by at least one
  other declaration, and a largest weak component of 177 declarations. This is
  source-level composition evidence, not Lean kernel dependency extraction or a
  theorem-difficulty metric.

If the paper reports different counts, include the exact source snapshot or
inventory file that reproduces them.

## Anonymization

The review bundle is anonymized. It contains no author names, GitHub handles,
email addresses, original PR numbers, or absolute local paths. The public
repository and non-anonymized provenance should be released only after the
review period.

See `ANONYMIZATION_CHECKLIST.md` before uploading the final supplement.
