Scaling Mathematical Reasoning through Data, Tools, and Generative Selection

Ivan Moshkov; Darragh Hanley; Ivan Sorokin; Shubham Toshniwal; Christof Henkel; Benedikt Schifferer; Wei Du; Igor Gitman

Scaling Mathematical Reasoning through Data, Tools, and Generative Selection

Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman

Published: 09 Jul 2025, Last Modified: 16 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: Math Reasoning, Chain-of-Thought Reasoning, Supervised Fine-tuning

Abstract: This paper presents our high-scoring submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long-reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon the majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we will release our code, models, and the complete $\texttt{MathReason}$ dataset upon publication.

Submission Number: 167

Loading