Improved Noisy Iterative Pseudo-Labeling for Semi-Supervised Speech Recognition

Tian Li, Qingliang Meng, Yujian Sun

Published: 2022, Last Modified: 11 Apr 2025SLT 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to the high annotation cost in ASR, the implementation of semi-supervised training has been a hot issue in research and industry. In a multitude of recent investigations, it has been established that pseudo-labeling, a fundamental sub-direction of semi-supervised learning, is effective in ASR. However, if the iterative PL is utilized, the expense of doing data experiments is prohibitively high, making the promotion to diverse situations of ASR tasks problematic. In this paper, we propose an empirical scoring method based on hypothesis distribution testing to guide iterative PL training, therefore lowering the cost of data experiments and boosting ASR performance. Meanwhile, we conducted extensive experiments to determine the necessity and limitation of model perturbation in the initial training and the PL stages. On the Librispeech 100/860 task, our method improves the 12+6 transformer-based CTC+S2S architecture performance from 4.8%/10.1 % to 3.9%/9.6% on test-clean and test-other.