Abstract: Large language models (LLMs) are capable of self-correct their responses by generating feedback and refining the initial output. However, their performance may sometimes decline following self-correction, either because the feedback contains errors or because they unnecessarily attempt to refine an already accurate response. To address these limitations, we investigate whether LLMs can generate meta-feedback that pinpoints errors in the feedback rather than the response.While the ability of LLMs to generate self-feedback has been well-researched, their potential to provide constructive meta-feedback remains under-explored. We design a novel self-correction prompting framework, Feedback-on-Feedback (FoF), which leverages meta-feedback to improve the feedback before refining the response. Our framework first samples multiple feedbacks for the initial response, and prompts the LLM to generate a meta-feedback that analyze the inconsistency between these feedbacks. Based on the meta-feedback, the LLM generates a refined feedback that subsequently guides the revision of the response. Our FoF framework uniformly outperforms competitive baselines across two base models in different sizes and three datasets spanning arithmetic reasoning, machine translation and programming, with an improvement of up to 1.68% in GSM8K task by LLaMA3-8B model.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Meta-Feedback, Self-Correction, LLMs Feedback Mechanisms, Model Interpretability, Model Evaluation
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Chinese
Submission Number: 5643
Loading