Keywords: automatic speech recognition, conformal prediction, conformal risk control, large language models, ASR error correction, uncertainty quantification, adaptive hypothesis selection, N-best hypotheses, LoRA fine-tuning, word error rate, statistical guarantees, generative error correction
TL;DR: We introduce an adaptive error correction framework for speech recognition that selects the optimal number of hypotheses using conformal risk control, achieving robust performance with smaller sets.
Abstract: Automatic Speech Recognition (ASR) systems frequently produce transcription errors due to acoustic variability, which require post-processing correction methods. Recent approaches leverage Large Language Models (LLMs) for generative ASR error correction using N-best hypotheses but rely on fixed set sizes regardless of input complexity and do not provide performance guarantees. We propose an adaptive framework that dynamically determines the optimal number of hypotheses for each input using conformal risk control (CRC). This mechanism leverages ASR confidence scores and applies CRC to control the expected relative word error rate degradation compared to the best achievable performance for a given model and hypothesis set. Experimental results show that our approach matches or exceeds fixed-size correction baselines while requiring fewer hypotheses on average, maintaining robust performance under diverse acoustic conditions.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 18438
Loading