Abstract: Automatic Speech Recognition (ASR) systems generate real-time transcriptions, but often miss nuances that human interpreters capture. While ASR is useful in many contexts, interpreters - who already use ASR tools such as Dragon - add critical value, especially in sensitive settings such as diplomatic meetings where subtle language is key. Human interpreters not only perceive these nuances but can adjust in real-time, improving accuracy, while ASR handles basic transcription tasks. However, ASR systems introduce a delay that does not align with real-time interpretation needs. The user-perceived latency (UPL) of ASR systems differs from that of interpretation because it measures the time between speech and transcription delivery. To address this, we propose a new approach to measuring delay in ASR systems and validate if they are usable in live interpretation scenarios.
External IDs:doi:10.1109/mic.2025.3614363
Loading