Dissecting Efficient Architectures for Wake-Word Detection

Cody Berger; Juncheng B Li; Yiyuan Li; Aaron Berger; Dmitri Berger; Karthik Ganesan; Emma Strubell; Florian Metze

Dissecting Efficient Architectures for Wake-Word Detection

Cody Berger, Juncheng B Li, Yiyuan Li, Aaron Berger, Dmitri Berger, Karthik Ganesan, Emma Strubell, Florian Metze

Published: 20 Jun 2023, Last Modified: 16 Jul 2023ES-FoMO 2023 PosterEveryoneRevisionsBibTeX

Keywords: Efficient Architectures; FLOP; Latency;

TL;DR: We compare different efficient architectures for wake-word detection task and analyze their accuracy and latency tradeoffs.

Abstract: Wake-word detection models running on edge devices have stringent efficiency requirements. We observe that over-the-air test accuracy of models trained on parallel devices (GPU/TPU) usually degrades when deployed on edge devices using a CPU for over-the-air, real-time Further, differing inference time when migrating between GPU and CPU varies across models. This drop is due to hardware latency and acoustic impulse response, while non-uniform growth of inference time results from models' varying exploitation of hardware acceleration. We compare five Convolutional Neural Network (CNN) architectures and one pure Transformer architecture, train them for wake-word detection on the Speech Commands dataset, and quantize two representative models. We seek to quantify their accuracy-efficiency tradeoffs to inform researchers and practicioners about the key components in models influencing this tradeoff.

Submission Number: 32

Loading