Neuro-Symbolic Data Generation for Math Reasoning

Zenan Li; Zhi Zhou; Yuan Yao; Xian Zhang; Yu-Feng Li; Chun Cao; Fan Yang; Xiaoxing Ma

Neuro-Symbolic Data Generation for Math Reasoning

Zenan Li, Zhi Zhou, Yuan Yao, Xian Zhang, Yu-Feng Li, Chun Cao, Fan Yang, Xiaoxing Ma

Published: 25 Sept 2024, Last Modified: 19 Dec 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neuro-symbolic AI, Large language models, Mathemtical reasoning, Data generation

TL;DR: A neuro-symbolic framework generating high-quality and supervised mathematical datasets

Abstract: A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.

Supplementary Material: zip

Primary Area: Natural language processing

Submission Number: 9836

Loading