Keywords: Large Language Models, Large Reasoning Models, Mathematical Reasoning, Synthetic Data, RLVR
Abstract: Recent progress in reinforcement learning with verifiable rewards (RLVR) has substantially advanced the mathematical reasoning ability of large reasoning models (LRMs). However, existing datasets either rely heavily on manual annotation or are synthesized within artificial environments such as logic games. In this work, We propose a data synthesis framework that transforms formal mathematical statements into high-quality verifiable reasoning data. It first performs Statement Collection and Quality Control to obtain high-quality proven statements, then applies Problem Generation to convert them into verifiable math solving problems, and finally leverages RLVR with a verifier for Model Training. Using this framework, we synthesize 19k high-quality mathematical problems at levels 5–10 and train the F1-Reasoner series of models. Across six challenging benchmarks, F1-Reasoner consistently improves upon 3 different open-weight models across different sizes, outperforming models such as SynLogic and Absolute-Zero that are trained on verifiable data from other environments. Moreover, we mix our data with MATH to create F1-Reasoner-Mix, which further boosts performance; notably, F1-Reasoner-Mix-8B surpasses General-Reasoner-14B while using substantially less data. Further analysis shows that F1-Reasoner generalizes to informal theorem proving and exhibits richer thinking behaviors.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9641
Loading