Large Language Model Should Understand Pinyin for Chinese ASR Error Correction

Published: 2025, Last Modified: 15 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) can enhance automatic speech recognition (ASR) systems through generative error correction (GEC). In this paper, we propose Pinyin-enhanced GEC (PY-GEC), which leverages Pinyin—the phonetic representation of Mandarin Chinese—as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) aligned feature space between Pinyin and text hidden states.
Loading