Confidence Introspection: A Self-reflection Method for Reliable and Helpful Large Language Models

ACL ARR 2024 December Submission1154 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) suffer from factual hallucinations, meaning the LLMs confidently provide responses that are inconsistent with reality. Previous studies explored fine-tuning-based verbalized confidence calibration to mitigate hallucination, yet these approaches often resulted in overly conservative models, compromising their ability to provide relevant knowledge. Inspired by human introspection processes, we propose Confidence Introspection Training, a novel approach that enables LLMs to accurately express their confidence while maintaining helpfulness. This method follows a two-stage framework: first, it estimates the confidence through question paraphrasing and sampling. Subsequently, through self-generated training data, the model develops the ability to classify questions as known, uncertain, or unknown while providing appropriate responses or relevant knowledge for each class. Experimental results demonstrate that our method effectively enhances the reliability of LLMs by accurately expressing confidence levels while preserving the model's ability to provide informative responses.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Confidence calibration, Reliability,helpfulness
Contribution Types: NLP engineering experiment
Languages Studied: english
Submission Number: 1154
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview