Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition

ACL ARR 2026 January Submission2516 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic Speech Recognition, Robust Speech Recognition, LLM-Assisted

Abstract: Automatic Speech Recognition (ASR) performance degrades severely under noise, a challenge that is particularly pronounced for low-resource languages such as Persian. Even state-of-the-art systems like Whisper exhibit substantial accuracy loss at low signal-to-noise ratios. We propose a noise-aware ASR error correction framework that combines multiple transcription hypotheses with explicit modeling of linguistic noise. From noisy Persian speech, we generate 5-best hypotheses using Whisper and introduce Error Level Noise (ELN), a representation that captures sentence- and token-level disagreement across hypotheses as a proxy for noise-induced uncertainty. ELN vectors are used to condition a fine-tuned LLaMA-2-7B model during post-hoc correction. Experiments show that ELN conditioning significantly reduces Word Error Rate (WER). On the Mixed Noise test set, our model lowers WER from 31.10\% (Raw Whisper) to 24.84\%, outperforming a text-only fine-tuned baseline (30.79\%), while a zero-shot LLaMA-2 model fails to correct Persian ASR outputs. These results demonstrate the effectiveness of noise-aware multi-hypothesis correction for robust Persian ASR.

Paper Type: Short

Research Area: Speech Processing and Spoken Language Understanding

Research Area Keywords: automatic speech recognition, ASR robustness, LLM-assisted robustness

Contribution Types: Approaches to low-resource settings

Languages Studied: Persian

Submission Number: 2516

Loading