Open Source Speech Recognition on Edge Devices

Published: 01 Jan 2020, Last Modified: 09 Nov 2024ACIT 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep learning has revived the field of automatic speech recognition (ASR) in the last ten years and pushed recognition rates into regions on par with humans. Applications like Siri, Amazon Alexa and Google Assistant are very popular, but have inherent privacy problems. In this paper, we evaluate state of the art open source ASR models regarding their usability in a smart speaker without cloud, both in terms of accuracy and runtime performance on cost-effective low power edge devices. We found Kaldi to be the most accurate solution and also among the fastest ones. It runs more than fast enough on an Nvidia Jetson Nano. It is still not on par with commercial cloud services, but getting close to it.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview