Synthesizing Verified Mathematical Problems

Published: 10 Oct 2024, Last Modified: 31 Oct 2024MATH-AI 24EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model; Mathematical Reasoning; Data Synthesis
Abstract: Mathematical data synthesis offers a potentially effective solution for enhancing the mathematical capabilities of large language models. However, existing methods either synthesize a large number of rationales based on existing questions, limiting the diversity of the questions, or rely on advanced proprietary models to directly generate new questions without verification, which cannot guarantee the correctness of the synthesized problems. This paper introduces a novel method, mathematical data synthesis through Algorithmic \textbf{A}bstraction, \textbf{I}mplementation, and \textbf{C}ontextualization (AIC), to synthesize new and verifiable mathematical problems. \textbf{AIC} abstracts mathematical problems into algorithms, implements these algorithms as code functions, and contextualizes them under different conditions to create new problems, which are then verified using code functions. Experimental results on multiple challenging mathematical benchmarks show that models fine-tuned on our synthesized data are superior to previous state-of-the-art models. Further experiments indicate that, when controlling for the same synthesizer, data synthesized using the AIC method is not only more accurate but also more effective at improving the model's mathematical abilities.
Concurrent Submissions: N/A
Submission Number: 34
Loading