Keywords: ASR, Reservoir Computing, Edge Computing, Neuromorphic Computing
Abstract: Automatic Speech Recognition (ASR) ensures seamless interaction between humans and LLM-powered AI. Current state-of-the-art ASR models are transformer-based neural networks that have a very high level of accuracy but come with the cost of high complexity, partly due to the attention mechanisms present in the model. The latest ASR model by Open AI, Whisper Large, has 1.55bn parameters (~2.9 GB) and is reported by users to require around 12 GB of VRAM to run. A model of this size has to be deployed on the cloud, which introduces network latency, slowing response times and degrading user experience. Indeed, edge devices cannot host a model of this size: the latest iPhone 15 Pro Max is estimated to have around 8 GB of RAM. Due to limited resources, current edge-based ASR models also struggle with accuracy. Reservoir Computing offers the potential for a new generation of edge-based ASR models with low latency and high accuracy.
Submission Number: 50
Loading