Evaluating the Effectiveness of Human-Annotated Math Statements on Olympiad-Level Math Problems

Evaluating the Effectiveness of Human-Annotated Math Statements on Olympiad-Level Math Problems

ACL ARR 2025 May Submission508 Authors

13 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Math statements, including definitions, theorems, axioms, lemmas, formulas, and so on, provide a clear and precise way to express mathematical concepts, which helps in constructing logical arguments and proofs for mathematical reasoning. Currently, there is a lack of systematic research to verify the role of math statements in solving math problems of Olympiad-level difficulty. In this paper, we conducted extensive experiments to evaluate the mathematical reasoning performance of multiple cutting-edge large language models (LLMs) with and without math statements as prompts. We found that problem-aligned math statements can substantially enhance the problem-solving capabilities of LLMs on complex Olympiad-level math problems. Notably, this enhancement is particularly pronounced in smaller-scale models such as Qwen2.5-Math-7B, where our curated math statements can achieve accuracy gains of over 10%. Even advanced deep reasoning models such as QwQ-32B still demonstrated a 3.5% accuracy improvement. Moreover, we construct the SA-Math dataset, which comprises 114 human-annotated Olympiad-level math problems, along with 130 domain-relevant math statements. We believe that our work can facilitate the math-problem-solving capabilities of LLMs.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: evaluation, NLP datasets, benchmarking, math QA

Contribution Types: Data resources

Languages Studied: English

Submission Number: 508

Loading