The Reason behind Good or Bad: Towards a Better Mathematical Verifier with Natural Language Feedback

ACL ARR 2024 June Submission5250 Authors

16 Jun 2024 (modified: 05 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose **Math-Minos**, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6% (86.6% → 88.2%) on GSM8K and 0.8% (37.8% → 38.6%) on MATH.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Mathematical Reasoning
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5250
Loading