Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

Sagnik Mukherjee; Abhinav Chinta; Takyoung Kim; Tarun Anoop Sharma; Dilek Hakkani Tur

Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

Sagnik Mukherjee, Abhinav Chinta, Takyoung Kim, Tarun Anoop Sharma, Dilek Hakkani Tur

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper shows that in reasoning chains, LLMs can identify premises under which a particular step was written. Also using these premises improves error identification.

Abstract: Chain-of-Thought (CoT) prompting enhances mathematical reasoning in large language models (LLMs) by enabling detailed step-by-step solutions. However, due to the verbosity of LLMs, the resulting reasoning chains can be long, making it harder to verify the reasoning steps and trace issues resulting from dependencies between the steps that may be farther away in the sequence of steps. Importantly, mathematical reasoning allows each step to be derived from a small set of premises, which are a subset of the preceding steps in the reasoning chain. In this paper, we present a framework that identifies the premises for each step, to improve the evaluation of reasoning. We restructure conventional linear reasoning chains into Premise Augmented Reasoning Chains (PARC) by introducing premise links, resulting in a directed acyclic graph where the nodes are the steps and the edges are the premise links. Through experiments with a PARC-based dataset that we built, namely (Premises and ERrors identification in LLMs), we demonstrate that LLMs can reliably identify premises within complex reasoning chains. In particular, even open-source LLMs achieve 90% recall in premise identification. We also show that PARC helps to identify errors in reasoning chains more reliably. The accuracy of error identification improves by 6% to 16% absolute when step-by-step verification is carried out in PARC under the premises. Our findings highlight the utility of premise-centric representations in addressing complex problem-solving tasks and open new avenues for improving the reliability of LLM-based reasoning evaluations.

Lay Summary: Large language models (LLMs) like ChatGPT can solve math problems step by step, but they often make mistakes that are hard to spot. This paper proposes a new way to trace each step’s logic by linking it only to the specific earlier steps it depends on, similar to how we show our work in math class. This forms a structure called a Premise-Augmented Reasoning Chain (PARC). Using this method and a new dataset called PERL, the authors show that LLMs can better detect both direct mistakes and hidden errors that build up over time. The approach improves the accuracy of identifying errors and could make LLMs more trustworthy for tasks that require careful reasoning.

Link To Code: https://github.com/SagnikMukherjee/PARC

Primary Area: Deep Learning->Large Language Models

Keywords: Mathematical Reasoning, Large Language Models

Submission Number: 7545

Loading