Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

ACL ARR 2024 December Submission1670 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Although large language models demonstrate strong performance across various domains, they still struggle with numerous bad cases in mathematical reasoning. Previous approaches to learning from errors synthesize training data by solely extrapolating from isolated bad cases, thereby failing to generalize the extensive patterns inherent within these cases. This paper presents Self-Error-Instruct (SEI), a framework that addresses these model weaknesses and synthesizes more generalized targeted training data. Specifically, we explore a target model on two mathematical datasets, GSM8K and MATH, to pinpoint bad cases. Then, we extract error keyphrases from these cases based on the instructor model’s (GPT-4o) analysis and identify error types by clustering these keyphrases. Next, we sample a few bad cases during each generation for each identified error type and input them into the instructor model, which synthesizes additional training data using a self-instruct approach. This new data is refined through a one-shot learning process to ensure that only the most effective examples are kept. Finally, we use these curated data to fine-tune the target model, iteratively repeating the process to enhance performance. We apply our framework to LLaMA3-8B-Instruct and Qwen2.5-Math-7B-Instruct, achieving average performance gains of 2.55% on in-domain evaluations and 11.19% on out-of-domain evaluations. These results demonstrate the effectiveness of self-error instruction in improving LLMs’ mathematical reasoning through error generalization. Our code and dataset are available at https://anonymous.4open.science/r/SEI-7228/README.md.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: mathematical NLP;data augmentation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 1670

Loading