Don't Be Overconfident When You're Wrong; Don't Be Underconfident When You're RightDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: When responding to any question from the user or an API,a conversational search or question answering systemshould ideally be able to attach an appropriate confidence score to its output.While such systems are often overconfident,there are also situations where the system responds correctly yet lacks enough confidence.Underconfident responses cannot be relied upon, and therefore may not be utilised by the user or downstream tasks.Ideally, we want to know when systems are underconfident as well as when they are overconfident,and want to suppress both phenomena in a balanced manner.Furthermore, in this scenario,we want an evaluation measure that is guaranteed to(a) penalise a lowered confidence for a correct response; and also(b) penalise a raised confidence for an incorrect response.In light of this,we propose HMR (Harmonic Mean of Rewards)and demonstrate its advantages over existing calibration measures for our purposeby means of examples, axioms, and theorems.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data analysis, Theory
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview