SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Synthetic Math Datasets, Synthetic dataset, Math Dataset, Synthetic data for LLMs, Data generation pipeline, Generating Challenging or Olympiad level Math problems
TL;DR: A novel synthetic data generation pipeline for challenging math problems.
Abstract: The demand for Large Language Models (LLMs) of various sizes capable of sophisticated mathematical reasoning keeps growing. However, the development of performant mathematical LLMs is often bottlenecked by the scarcity of useful training data containing problems with significant complexity. We introduce \textbf{SAND-Math} (Synthetic Augmented Novel and Difficult Mathematics problems and solutions), a pipeline that addresses this by first synthesizing high-quality problems from scratch and then systematically elevating their complexity via a new \textbf{Difficulty Hiking} step. We demonstrate the effectiveness of our approach through two key findings. First, augmenting a strong post-training baseline with a small 500-sample SAND-Math dataset significantly boosts performance, outperforming the next-best synthetic dataset by $\uparrow$ 17.85 absolute points on AIME25 benchmark. Second, in a dedicated ablation study, we show the effectiveness of our Difficulty Hiking process in increasing average problem difficulty from 5.02 to 5.98. This step consequently lifts AIME25 results from 46.38\% to 49.23\%. The full generation pipeline, final dataset, and a fine-tuned model form a practical and scalable toolkit for building more capable and efficient mathematical reasoning LLMs.
Submission Number: 148
Loading