Mutate to Calibrate: Enhancing LLM Confidence Quantification with Diverse Semantic Mutation

ACL ARR 2025 February Submission5949 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) bring about a transformative shift in the field of Natural Language Processing (NLP). Despite the numerous benefits they offer, these models also present significant safety risks. To effectively address these risks, it is essential to establish robust self-evaluation frameworks. However, existing methods often suffer from overconfidence, which undermines the reliability of evaluations. In this work, we present the Mutate-to-Calibrate (M2C) method, which improves confidence calibration by ensuring semantic diversity in training questions. By generating diverse question variations through semantic mutations and using a self-consistent approach to quantify confidence, we construct a fine-tuning dataset and achieve confidence calibration through supervised fine-tuning. Experiments are carried out with Chinese and English LLMs, and the findings reveal that M2C achieves an effective confidence calibration and improves the accuracy of safety self-evaluations. These findings highlight the importance of semantic diversity in enhancing LLM confidence quantification and provide a promising direction for improving LLM safety evaluation.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/unfairness mitigation,reflections and critiques
Contribution Types: NLP engineering experiment
Languages Studied: Chinese,English
Submission Number: 5949
Loading