From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

ACL ARR 2026 January Submission1561 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speech-LLMs, ASR, Contextual ASR, DPO
Abstract: Contextual automatic speech recognition (ASR) with Speech-LLMs is typically trained with oracle conversation history, but rely on error-prone history at inference, causing a train–test mismatch in the context channel that we term contextual exposure bias. We propose a unified training framework to improve robustness under realistic histories: (i) Teacher Error Knowledge by using Whisper large-v3 hypotheses as training-time history, (ii) Context Dropout to regularize over-reliance on history, and (iii) Direct Preference Optimization (DPO) on curated failure cases. Experiments on TED-LIUM 3 (in-domain) and zero-shot LibriSpeech (out-of-domain) show consistent gains under predicted-history decoding. With a two-utterance history, SFT with Whisper histories reduce WER from 5.59\% (oracle-history training) to 5.47\%, and DPO further improves to 5.17\%. Under irrelevant-history attacks, DPO yields the smallest degradation (5.17\% $\rightarrow$ 5.63\%), indicating improved robustness to misleading context. Our code and models are published on https://anonymous.4open.science/r/Contextual_Speech_LLMs-3210.
Paper Type: Long
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: automatic speech recognition
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: english
Submission Number: 1561
Loading