Improving Language Model Self-Correction with Meta-Feedback

Improving Language Model Self-Correction with Meta-Feedback

ACL ARR 2024 June Submission5643 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are capable of self-correct their responses by generating feedback and refining the initial output. However, their performance may sometimes decline following self-correction, either because the feedback contains errors or because they unnecessarily attempt to refine an already accurate response. To address these limitations, we investigate whether LLMs can generate meta-feedback that pinpoints errors in the feedback rather than the response.While the ability of LLMs to generate self-feedback has been well-researched, their potential to provide constructive meta-feedback remains under-explored. We design a novel self-correction prompting framework, Feedback-on-Feedback (FoF), which leverages meta-feedback to improve the feedback before refining the response. Our framework first samples multiple feedbacks for the initial response, and prompts the LLM to generate a meta-feedback that analyze the inconsistency between these feedbacks. Based on the meta-feedback, the LLM generates a refined feedback that subsequently guides the revision of the response. Our FoF framework uniformly outperforms competitive baselines across two base models in different sizes and three datasets spanning arithmetic reasoning, machine translation and programming, with an improvement of up to 1.68% in GSM8K task by LLaMA3-8B model.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Meta-Feedback, Self-Correction, LLMs Feedback Mechanisms, Model Interpretability, Model Evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English, Chinese

Submission Number: 5643

Loading