Uncertainty as Perceptual Testimony in Vision-Language Models
Keywords: uncertainty, trust, testimony, vision-language models, calibration
Abstract: Machine learning evaluates uncertainty by calibration: do stated probabilities match empirical accuracy? Calibration fits measurement-like signals such as softmax, but is now applied without scrutiny to natural-language confidence ("I'm 95% sure") in vision-language models (VLMs). We argue this is a category error: the verbal signal is mis-categorized as measurement, leaving the testimonial dimension it also carries unscored. A verbalized confidence is not a measurement; it is testimony, a speech act governed by reasons-giving and defeater-sensitivity—and in VLMs it is perceptual testimony, the report of a witness whose warrant depends on what was visually perceived. This yields trust norms calibration cannot supply: defeater-sensitivity, reasons–confidence linkage, and cross-modal coherence. An illustrative probe of Claude Sonnet 4.6 on 30 image-classification items under noise, occlusion, textual challenge, and empty pressure shows the model gives perceptually specific reasons yet pins its verbalized confidence near ceiling under mild degradation, failing to track the weakening evidence it cites. Aggregate calibration diagnoses the overconfidence but misses this testimonial failure: as a perceptual witness, this model is not yet trustworthy on our probe.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 47
Loading