# Bundled data and external dependencies

This supplementary is **fully self-contained for code and data**. There is no
private / anonymous HuggingFace org to download from. The only external
download you need is the public OLMo-3 base model from
`allenai/OLMo-3-7B-Instruct-SFT` on HuggingFace.

> **No checkpoints are shipped.** Reproducing the headline pass@k numbers
> requires re-running the SFT and RL stages on the bundled data. The full
> recipe (hyperparameters, hardware, wall-clock estimates) is in
> `README.md` and the paper appendix.

## Bundled data

| Path in this zip | Stage | Contents |
|---|---|---|
| `data/rl/train_combined_v2_sft_ep3.parquet`               | Stage 2 | 9.2 MB combined RL training corpus (~9 200 prompts spanning Bridges 5×5/7×7, Pattern 3×3/4×4, Undead 3×3/4×4, Galaxies 3×3/4×4) |
| `data/sft/galaxies_3x3de_dsr/data/train-*.parquet`        | Stage 1 | 191 DSR-distilled traces, Galaxies 3×3 |
| `data/sft/galaxies_4x4de_dsr/data/train-*.parquet`        | Stage 1 | 727 traces, Galaxies 4×4 |
| `data/sft/pattern_3x3_dsr/data/train-*.parquet`           | Stage 1 | 83 traces, Pattern 3×3 |
| `data/sft/pattern_4x4_dsr/data/train-*.parquet`           | Stage 1 | 587 traces, Pattern 4×4 |
| `data/sft/undead_3x3de_dsr/data/train-*.parquet`          | Stage 1 | 899 traces, Undead 3×3 |
| `data/sft/bridges_5x5de_dsr/data/train-*.parquet`         | Stage 1 | 887 traces, Bridges 5×5 |
| `data/eval/<puzzle>/<size>_test200.parquet`               | OOD eval | 200-problem held-out puzzle test sets used in Fig. 1 (Bridges 8×8, Undead 5×5, Pattern 5×5, …) |

The Stage 1 SFT corpus is rejection-sampled DSR puzzle traces (only traces
whose final answer verifies under the puzzle scorer were kept). Total
SFT corpus size: ~70 MB across 6 puzzle splits, all bundled.

## External dependencies (public HF only)

| Asset | Source | Why |
|---|---|---|
| OLMo-3-7B-Instruct-SFT      | `allenai/OLMo-3-7B-Instruct-SFT` | Base model for both SFT and RL stages |
| OlymMATH (Easy / Hard)      | `RUC-AIBOX/OlymMATH` | Out-of-domain math eval (primary metric) |
| AIME 2024                   | `Maxwell-Jia/AIME_2024` | OOD math eval |
| AIME 2025                   | `yentinglin/aime_2025`  | OOD math eval |
| MATH-500                    | `HuggingFaceH4/MATH-500` | OOD math eval |
| HMMT problems               | bundled YAML at `evaluate/custom_tasks/hmmt/` | OOD math eval |
| OMEGA Explorative           | `OMEGA-Bench/...` (cited in paper) | OOD math eval |

These are public datasets; downloading requires only a HuggingFace account
(no token needed for public assets). The OLMo-3 base download is ~14 GB.

## Reproducing pass@k numbers from the bundled data

Stage 1 SFT and Stage 2 RL both consume the bundled parquets directly —
the training scripts under `train/verl_sft/` and `train/verl_grpo/` have
been wired to the local paths above. End-to-end recipe:

```bash
# 0. Set up environment
bash scripts/setup_vllm012_venv.sh
source $HOME/verl-vllm012/bin/activate

# 1. Stage 1: SFT on bundled DSR-distilled puzzle traces (4× B200, ~36 hr)
bash train/verl_sft/multi_puzzle_dsr_olmo3_v2.sh

# 2. Merge SFT epoch 5 in fp32 (critical for hard-math accuracy)
python src/verl_helpers/merge_lora.py \
    --base_model allenai/OLMo-3-7B-Instruct-SFT \
    --lora_path  checkpoints/sft_run/global_step_<EP5_STEP> \
    --output_dir checkpoints/merged_ep5_fp32 \
    --torch_dtype float32

# 3. Stage 2a: vanilla GSPO (8× B200, ~50 hr)
bash train/verl_grpo/multi_puzzle_gspo_olmo3_v2_sft_ep3.sh

# 4. Stage 2b: novelty-bonus GSPO (4× B200, ~50 hr)
bash train/verl_grpo/novelty_production_gspo_28k_n4_galaxiesexact.sh

# 5. Eval (OlymMATH-Hard pass@32 example)
bash scripts/evals/eval_novelty_prod_s15_olymp.sh
python scripts/evals/compute_pass_at_k.py \
    results/.../<output_subdir> --k_values 1,8,32 --workers 8
```

Total compute to fully reproduce the headline 36.0% OlymMATH-Hard
pass@32 number from scratch is approximately 4×B200 × 36 hr (SFT) +
8×B200 × 50 hr (vanilla GSPO) + 4×B200 × 50 hr (novelty GSPO)
≈ 944 GPU-hours.

## Anonymization

* No author names, institutional affiliations, or personal HF usernames
  appear in any path or URL.
* Identifiers swept and confirmed absent at packaging time: personal HF
  usernames, author surnames, institutional domains, internal project
  names, S3 bucket names, cluster types, WandB entity strings, and any
  literal HF / WandB API tokens.
* If you find a leak, please flag it via the venue's private review channel.
