Abstract: Keyword spotting (KWS) is one of the major tasks required in online sensing and surveillance devices. This operation requires the processing and analysis of the audio signals that are complex signals with a random distribution of the information in time. In that context, this paper proposes an event-driven low-power time-based feature extractor computing the spectral energy distribution in the form of a spectrogram. A convolutional neural network (CNN) then performs the classification to detect a specific keyword among multiple learned keywords. This system has been tested on a DE2-115 Inter® FPGA board to prove the hardware implementability and synthesized on 60nm CMOS technology for power computation. Furthermore, the combined simulation of the FPGA implementation and the Matlab® model has demonstrated that the system performs the recognition of up to 10 keywords with an accuracy of 90.4% and a power budget of 9.24μW, for an average speech rate of 60 words/minute.
0 Replies
Loading