# Supplementary material for the ICLR submission Goedel-Prover-V2

## Repository Structure

```text
.
├── data/                   
│   ├── omni_math.jsonl     # Formalizer evaluation dataset
│   └── sampled_train.jsonl # Sampled training dataset
├── dataset/                
│   ├── MOBench.jsonl
│   └── minif2f.jsonl
├── lean_compiler/          
│   └── repl_scheduler.py   # REPL scheduler for compilation
├── mathlib4/   
├── model_avg
│   └── main.py             # Code for Model Averaging            
├── scripts/                
│   └── pipeline.sh         # Main pipeline script (Inference -> Compile -> Summarize) * n
├── src/                    
│   ├── inference.py        # LLM generation (vLLM based)
│   ├── compile.py          # Verifies generated Lean code
│   └── summarize.py        # Generates reports and error feedback
├── goedelv2.yml            # Conda environment configuration
└── README.md
````

## Installation

We follow the setup of DeepSeek-Prover-V1.5, using Lean 4 (v4.9) and the corresponding Mathlib.

### 1\. Install Lean 4

Follow the official instructions on the [Lean 4 installation page](https://www.google.com/search?q=https://leanprover.github.io/lean4/doc/setup.html) to install `elan` (the Lean version manager).

### 2\. Set up Python Environment

Create a generic Conda environment with the required dependencies:

```bash
conda env create -f goedelv2.yml
conda activate goedelv2
```

### 3\. Build Mathlib4

Compile the local Mathlib4 dependency to ensure the Lean environment is ready for proof verification:

```bash
cd mathlib4
lake build
cd ..
```

## Usage

The core of this repository is the **Iterative Correction Pipeline**, which consists of three stages:

1.  **Inference:** The LLM generates proof candidates.
2.  **Compilation:** The system attempts to compile the proofs using Lean 4.
3.  **Summarization:** Results are analyzed. If a proof fails, the error message is captured and fed back into the model for the next round (if configured).

### Configuration

Before running the evaluation, modify the **Configuration Section** in `scripts/pipeline.sh` to match your environment.

Key parameters to check:

| Parameter | Description | Example |
| :--- | :--- | :--- |
| `MODEL_PATH` | Path to the model weights | `"deepseek-ai/DeepSeek-Prover-V2-7B"` |
| `DATA_PATH` | Path to the input problem JSONL | `"dataset/minif2f.jsonl"` |
| `GPUS` | Number of GPUs for vLLM inference | `4` |
| `MAX_CORRECTION_ROUNDS` | Correction loops (0 = single pass) | `2` |
| `INFERENCE_HANDLER` | Prompting strategy | `"dpskcot"` |

### Running the Pipeline

Once configured, execute the pipeline using the provided bash script. This script handles the loop between inference, compilation, and summarization automatically.

```bash
# Ensure the script is executable
chmod +x scripts/pipeline.sh

# Run the pipeline
bash scripts/pipeline.sh
```

## Output

Results are saved in the `results/` directory with a timestamp (e.g., `results/run_20251122_103000`).

  * **`to_inference_codes*.json`**: The raw code generated by the model.
  * **`code_compilation_repl*.json`**: The compilation logs indicating Success or Failure.
  * **`summary_round_*/`**: Human-readable reports and statistics for that round.