Confidence Introspection: A Self-reflection Method for Reliable and Helpful Large Language Models

Confidence Introspection: A Self-reflection Method for Reliable and Helpful Large Language Models

ACL ARR 2024 December Submission1154 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) suffer from factual hallucinations, meaning the LLMs confidently provide responses that are inconsistent with reality. Previous studies explored fine-tuning-based verbalized confidence calibration to mitigate hallucination, yet these approaches often resulted in overly conservative models, compromising their ability to provide relevant knowledge. Inspired by human introspection processes, we propose Confidence Introspection Training, a novel approach that enables LLMs to accurately express their confidence while maintaining helpfulness. This method follows a two-stage framework: first, it estimates the confidence through question paraphrasing and sampling. Subsequently, through self-generated training data, the model develops the ability to classify questions as known, uncertain, or unknown while providing appropriate responses or relevant knowledge for each class. Experimental results demonstrate that our method effectively enhances the reliability of LLMs by accurately expressing confidence levels while preserving the model's ability to provide informative responses.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Confidence calibration, Reliability,helpfulness

Contribution Types: NLP engineering experiment

Languages Studied: english

Submission Number: 1154

Loading