# Code Package

This package contains the code and selected outputs for the ICML AI4Math workshop submission.

## Contents

- `scripts/`: dataset construction, classification, ranking, execution checking, analysis and table-generation scripts.
- `tests/`: unit tests for representation construction, split handling, ranking summaries and execution summaries.
- `configs/`: mathlib subset selection profiles.
- `data/`: small checked JSONL and CSV datasets included for local smoke runs. Local filesystem paths in copied JSONL files are replaced by sanitized tokens.
- `results/tables/`: CSV summaries used by the paper tables.
- `results/execution/`: selected candidate-level execution caches for the 500-state execution check.
- `paper/`: LaTeX source, figures, generated tables and the ICML 2026 style files.

The package excludes local Git history, virtual environments, local LeanDojo/mathlib worktrees, build caches, LaTeX auxiliary files, logs and machine-specific paths.

## Basic Checks

```bash
python -m unittest discover -s tests
python scripts/make_paper_tables.py
python scripts/make_paper_figures.py
(cd paper && latexmk -pdf -interaction=nonstopmode main.tex)
```

## Execution Table Regeneration

```bash
python scripts/analyze_execution_audit.py \
  --input results/execution/state_reconstruction_direct_sample500_k5.jsonl \
  --classified-output results/execution/state_reconstruction_direct_sample500_k5_classified.jsonl \
  --summary-output results/tables/execution_audit_summary.csv \
  --by-strategy-output results/tables/execution_audit_by_strategy.csv \
  --examples-output results/tables/execution_error_examples.csv \
  --sample-size 500

python scripts/analyze_execution_significance.py \
  --classified-cache results/execution/state_reconstruction_direct_sample500_k5_classified.jsonl \
  --execution-table results/tables/execution_audit_by_strategy.csv \
  --significance-output results/tables/execution_accept_significance.csv \
  --gap-output results/tables/trace_execution_gap.csv
```
