Can Better Solvers Find Better Matches? Assessing Math-LLM Models in Similar Problem Identification

AAAI 2025 Workshop NeurMAD Submission13 Authors

09 Dec 2024 (modified: 23 Jan 2025)AAAI 2025 Workshop NeurMAD SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM and mathematical reasoning
Abstract: Researchers have adapted large language models (LLMs) for mathematical reasoning by fine-tuning them with math-specific datasets to create math-specialized LLMs. This paper evaluates such models not only on solving accuracy but also on their ability to identify similar problems. We introduce an indicator task—retrieving a similar problem given a query word problem—to assess whether the model’s internal representations of the word problems capture mathematical semantics. A model capable of solving a problem should also be adept at identifying problems requiring similar reasoning, as human experts do. Using a dataset of Probability Word Problems with formal symbolic annotations, we show that math-specialized LLMs often prioritize linguistic similarity over mathematical similarity. This underscores the need for symbolic intermediate representation during fine-tuning of a LLM to better capture mathematical essence of a problem aiding improvement in model’s consistency and reliability.
Submission Number: 13
Loading