Abstract: This research introduces a framework for constructing networked artificial intelligence systems featuring a lightweight neural network front-end tailored for long and intricate sequential data, such as audio voice recordings and health signals. Our approach uses a client-server design pattern, resulting in a compact and modular design that can be easily optimized for deployment on edge devices while still being able to incorporate more powerful backbone models. We tested the proposed blueprint on four different problem domains, including audio keyword spotting, speech emotion recognition, abnormal heart sound detection, and sentiment classification from social media text posts. The results showed an unweighted accuracy of 86%, 69%, 93%, and 95%, respectively, which are comparable or superior to other state-of-the-art methods that rely on pretrained models or pre-processing pipelines. Additionally, end-users' privacy is protected as their sensitive data are encoded and compressed before being sent over the network. These are essential aspects that machine learning practitioners should consider when designing networked AI applications in real-world scenarios.
Loading