Decoupling Recognition and Transcription in Mandarin ASR

Jiahong Yuan, Xingyu Cai, Dongji Gao, Renjie Zheng, Liang Huang, Kenneth Church

2021 (modified: 24 Aug 2022)ASRU 2021Readers: Everyone

Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio → Hanzi into two sub-tasks: (1) audio → Pinyin and (2) Pinyin → Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio → Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far.

0 Replies