# Lean2Isabelle Supplementary Artifact

This supplementary repository contains a compact, sanitized artifact for the
Lean-to-Isabelle proof-translation experiments. It includes sampled training
data, compact external-format examples, core pipeline scripts, prompt
templates, and environment instructions.

## Repository Structure

```text
.
├── dataset/
│   ├── minif2f_dsp_isa.jsonl
│   ├── split_stats.json
│   ├── schema.md
│   └── sampled_train/
│       ├── statement_sft_train_sample.jsonl
│       ├── theory_sft_train_sample.jsonl
│       ├── grpo_train_sample.jsonl
│       └── sample_manifest.json
├── pisa/
│   └── README.md
├── prompts/
├── scripts/
│   ├── pipeline.sh
│   └── train_grpo_mvp.sh
├── src/
│   ├── grpo_mvp.py
│   ├── teacher_translate.py
│   ├── train_grpo_mvp.py
│   └── verify.py
├── lean2isabelle.yml
└── lean2isabelle-ml.yml
```

## Installation

```bash
conda env create -f lean2isabelle.yml
conda activate lean2isabelle-supp
```

`lean2isabelle.yml` is the default environment for prompt construction, teacher
dry-runs, and reward smoke tests. It intentionally does not include PyTorch,
TRL, vLLM, Isabelle, PISA, or any project-specific package.
The default smoke tests do not require model weights, GPUs, Isabelle, or PISA.

For full teacher API calls or GRPO training, use the optional ML environment as
a starting point:

```bash
conda env create -f lean2isabelle-ml.yml
conda activate lean2isabelle-ml
```

The artifact is designed to run without any checkout of the full experiment
codebase. `scripts/pipeline.sh` clears `PYTHONPATH` and disables user-site
Python packages before running the smoke test.

## Smoke Test

```bash
bash scripts/pipeline.sh
```

The script compiles the core Python files, runs dry-run teacher translation for
statement and theory stages, orders sampled GRPO examples by difficulty, and
runs the GRPO-MVP prompt preparation and reward smoke test without starting
model training.

## Teacher Translation

The clean teacher-generation entry point is `src/teacher_translate.py`. It
preserves the statement and proof-stage prompt templates used by the
SFT/evaluation scripts, but omits retrieval backends, PISA filtering,
semantic-retry loops, raw logs, private endpoints, and run-state files.

```bash
python src/teacher_translate.py \
  --stage statement \
  --input dataset/sampled_train/grpo_train_sample.jsonl \
  --output outputs/statement_teacher.jsonl \
  --dry_run

python src/teacher_translate.py \
  --stage theory \
  --input dataset/sampled_train/grpo_train_sample.jsonl \
  --output outputs/theory_teacher.jsonl \
  --dry_run
```

For an OpenAI-compatible teacher API, set the key in an environment variable
and pass `--model`. The script contains no hard-coded keys, private endpoints,
local paths, logs, or PISA process management. Real API calls require the
optional ML environment, or at least `pip install openai`. Dry-run outputs
contain full prompts, and real-call outputs contain raw model generations and
provider error strings; both should be treated as derived data if run on
private inputs.

## GRPO-MVP Training

The compact training scaffold is in `src/train_grpo_mvp.py`. It includes:

- prompt construction from `dataset/sampled_train/grpo_train_sample.jsonl`;
- a bounded MVP reward wrapper where strict verifier success receives reward
  `1.0` and failed proofs receive at most `lambda_mvp`;
  - a verifier abstraction with a structural smoke-test backend and a small
    `pisa_http` adapter boundary for a local Portal-to-Isabelle wrapper;
- a TRL `GRPOTrainer` entry point for running GRPO-MVP training when model
  weights and the verifier backend are available.

Example full-training command:

```bash
export MODEL_NAME_OR_PATH="your-model-or-checkpoint"
export PISA_ENDPOINT="http://localhost:8000/verify"
bash scripts/train_grpo_mvp.sh
```

The structural verifier is only for artifact smoke tests. Reported verification
and GRPO rewards should use Portal-to-Isabelle; see `pisa/README.md`.

## Data Policy

The JSONL files are whitelist-sanitized. They exclude teacher candidates,
verifier traces, API requests, build logs, RAG snapshots, local paths, and API
keys. `sampled_train/` contains sampled Herald-ISA SFT and LeanWorkbook-ISA
GRPO training examples. `MiniF2F-DSP` is a Lean-proof external
predicted-statement set and does not include reference Isabelle statements.
Full validation and test splits are not included in this supplementary
artifact.

Release-construction tools used to sample and sanitize these files are kept outside this supplementary directory. They are not needed to run the smoke pipeline.
