Abstract: For keyword spotting (KWS) systems that usually work in mobile devices, a low-complexity design is essential for long stand-by time. Audio feature extraction and classifier modeling are the two main components of KWS systems. Log Mel-Frequency Spectral Coefficient (MFSC) is common for audio feature extraction due to its low complexity and good performance. Binary neural network (BNN) classifier, which owns binary weights and activations and performs convolution with XNOR, is applicable to low-complexity KWS applications. However, audio features are usually quantized with multiple-bit binary code to maintain high classification accuracy, which requires addition (ADD) operations in the first convolutional layer of the BNN model. Therefore, both XNOR and ADD units are needed in the BNN accelerator. To further reduce the complexity of KWS systems, we propose a new feature extraction method: Thermometer Codes of MFSC (MFSC-TC). Without LOG and DELTA operations, it is simpler than other MFSC-based methods. More importantly, convolution of all layers can be done by XNOR units due to the feature of thermometer code. The experiments with the Google Speech Commands dataset validate that the MFSC-TC-based BNN models outperform the models with more layers using other feature extraction methods.
External IDs:dblp:conf/apccas/JiaoLCLA22
Loading