Temporal Early Exiting for Streaming Speech Commands Recognition

Raphael Tang, Karun Kumar, Ji Xin, Piyush Vyas, Wenyan Li, Gefei Yang, Yajie Mao, G. Craig Murray, Jimmy Lin

Published: 2022, Last Modified: 05 Jul 2023ICASSP 2022Readers: Everyone

Abstract: Limited-vocabulary speech commands recognition is the task of classifying a short utterance as one of several speech commands, for which neural networks obtain state-of-the-art results. In particular, recurrent neural networks represent a common approach for streaming commands recognition systems. In this paper, we explore resource-efficient methods to short-circuit such systems in the time domain when the model is confident in its prediction. We propose applying a frame-level labeling objective to further improve the efficiency–accuracy trade-off. On two datasets in limited-vocabulary commands recognition, our best method achieves an average time savings of 45% of the utterance without reducing the absolute accuracy by more than 0.6 points. We show that the per-instance savings depend on the length of the unique prefix in the phonemes across a dataset.

0 Replies