Abstract: Deep learning has revived the field of automatic speech recognition (ASR) in the last ten years and pushed recognition rates into regions on par with humans. Applications like Siri, Amazon Alexa and Google Assistant are very popular, but have inherent privacy problems. In this paper, we evaluate state of the art open source ASR models regarding their usability in a smart speaker without cloud, both in terms of accuracy and runtime performance on cost-effective low power edge devices. We found Kaldi to be the most accurate solution and also among the fastest ones. It runs more than fast enough on an Nvidia Jetson Nano. It is still not on par with commercial cloud services, but getting close to it.
Loading