Keywords: Efficient Architectures; FLOP; Latency;
TL;DR: We compare different efficient architectures for wake-word detection task and analyze their accuracy and latency tradeoffs.
Abstract: Wake-word detection models running on edge devices have stringent efficiency requirements.
We observe that over-the-air test accuracy of models trained on parallel devices (GPU/TPU) usually degrades when deployed on edge devices using a CPU for over-the-air, real-time
Further, differing inference time when migrating between GPU and CPU varies across models.
This drop is due to hardware latency and acoustic impulse response, while non-uniform growth of inference time results from models' varying exploitation of hardware acceleration.
We compare five Convolutional Neural Network (CNN) architectures and one pure Transformer architecture, train them for wake-word detection on the Speech Commands dataset, and quantize two representative models. We seek to quantify their accuracy-efficiency tradeoffs to inform researchers and practicioners about the key components in models influencing this tradeoff.
Submission Number: 32
Loading