Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

ACL ARR 2025 May Submission6165 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Accurate confidence estimation of large language models (LLMs) is crucial to improve the reliability of their generation. However, existing methods are limited by their corse-grained confidence estimation and a narrow perspective, falling to provide continuous confidence estimation throughout the generation process. In this paper, we introduce FineCE, a novel fine-grained confidence estimation method that provides the accurate confidence scores during generation. Specifically, we develop a pipeline based on Monte Carlo Sampling to construct training data that captures the intrinsic responses of LLMs. In addition, we propose a Backward Confidence Integration (BCI) strategy, which incorporates confidence scores from subsequent text sequences to provide a more holistic confidence estimation for the current output. We further provide three strategies to identify optimal estimation positions for efficiency optimization. Extensive experiments demonstrate that FineCE consistently outperforms existing baselines in various tasks and exhibits strong calibration capability. Our code and all baselines are available in the GitHub.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Generation, Question Answering, NLP Applications
Languages Studied: English
Submission Number: 6165
Loading