Keywords: Alignment, Human Computer Interaction, Bias, LLMs, Overconfidence, Hallucination
Abstract: We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds accuracy, on average. However, this tendency toward overconfidence is moderated by a powerful hard-easy effect, wherein overconfidence is greatest on difficult tests; by contrast, easy tests actually show substantial underconfidence. We develop LifeEval, a test for evaluating model calibration across levels of difficulty.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: agent communication, safety and alignment for agents, grounded agents, model bias/fairness evaluation, human-AI interaction/cooperation, safety and alignment
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7024
Loading