## Explanations of Supplementary Material

* **uproof_valid.jsonl**: This is the OOD evaluation split of uproof dataset used in our main experiments.
* **uproof_train.jsonl**: This is the remaining part of our uproof dataset, actually we did not used them in our experiments, but they can serve as a large scale training data for FormaRL in the future.
* **qwen_frl_uproof_pa16.jsonl**: This is the samples of our main experiment on Qwen2.5-Coder-7B-Instruct. We fine-tuned this model on 859 unlabeled data and evaluated its performance on uproof. These samples are obtained in pass@16 setting.
* **qwen_frl_pfnet_pa1.jsonl**: The samples from the same model in pass@1 setting on ProofNet dataset, included as a reference.
* **qwen_frl_minif2f_pa1.jsonl**: The samples from the same model in pass@1 setting on miniF2F dataset, included as a reference.
