Detecting Whisper Hallucinations with Local Confidence Contrasts

Sam Corpataux; Anna Scius-Bertrand; Beat Wolf

Detecting Whisper Hallucinations with Local Confidence Contrasts

Sam Corpataux, Anna Scius-Bertrand, Beat Wolf

Published: 28 Feb 2026, Last Modified: 13 Mar 2026Swiss AI Days 2026 OralEveryoneRevisionsCC BY 4.0

Keywords: Whisper, STT

TL;DR: We introduce a lightweight, interpretable metric called Local Confidence Drop to detect hallucinations in speech recognition models by identifying sudden breaks in contextual stability.

Abstract: Automatic speech recognition has advanced significantly with models like Whisper, yet confident hallucinations remain a critical challenge. In this work, we propose a lightweight and interpretable error detection framework that augments acoustic confidence with explicit contextual features. We introduce the Local Confidence Drop, a novel metric designed to capture sudden stability dips between neighboring tokens. Evaluated on the FLEURS dataset, our fandom forest classifier achieves 0.64 AP, consistently outperforming the baseline (p < 0.001). Crucially, we demonstrate that hallucinations manifest as local contextual discontinuities, providing a transparent alternative to opaque neural post-processors.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 7

Loading