Abstract: Named Entity Recognition (NER) from speech is usually implemented through a two-step pipeline that consists of (1) processing the audio using an Automatic Speech Recognition (ASR) system and (2) applying an NER tagger to the ASR output. In this paper, we incorporate pinyin 1 — spelled sounds of Chinese characters — into the pipeline NER from Chinese speech, aiming to improve the NER performance through two steps. First, we take the pretrained model ChineseBERT to embed pinyin, as well as glyph and char, as the input of NER tagger. Second, we introduce homophone noises into training data for NER tagger, as homophone errors most likely exist in ASR output for Chinese speech. Using the two-step pipeline method with pinyin incorporated into the NER tagger, the F1 score is improved by nearly 1% absolute points in the experiment on the AISHELL-NER dataset, which is a significant improvement in the field of NER. And the F1 score outperforms the current state-of-the-art (SOTA) result on the AISHELL-NER dataset by 0.4% absolute points, despite the slightly worse Character Error Rate (CER) of our ASR.
Loading