Build a 50+ Hours Chinese Mandarin Corpus for Children's Speech Recognition

Published: 01 Jan 2024, Last Modified: 13 May 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Children’s speech recognition plays an important role in the education research of children. The usual automatic speech recognition (ASR) systems are not satisfactory in terms of speech recognition for children, mainly due to the lack of child speech corpus. In recent years, there have been a large number of related studies on children’s speech recognition, most of which are improved based on the existing small amount of data through data augmentation, transfer learning, etc, which the promotion is limited. The purpose of our research is to establish a children’s speech corpus for children’s speech recognition. Firstly, we use the children’s language proficiency assessment system to collect weakly labeled children’s speech data from 3 to 12 years old, and obtain transcribed text through a commercial-grade speech recognition interface. Then, this paper proposes a method to quickly verify the transcribed text, and screens out effective label data pairs by combining children’s pronunciation and Chinese pinyin characteristics, and constructs a corpus of 50+ hours Mandarin children’s speech. We experimented with the current mainstream mandarin’s end-to-end pretrained model, finetuning on the data we built, and can achieve a 15% CER reduction on the basis of baseline without using finetuning.
Loading