Keywords: Speech Emotion Recognition (SER), Deep Learning, Multilayer Perceptron (MLP), Adaptive Quantization, Edge Devices
TL;DR: Layer-Importance guided Adaptive Quantization for Efficient Speech Emotion Recognition
Abstract: Speech Emotion Recognition (SER) systems are crucial for enhancing human-machine interaction. Deep learning models have achieved significant success in SER without manually engineered features, but they require substantial computational resources, processing power, and hyper-parameter tuning, limiting their deployment on edge devices. To address these limitations, we propose an efficient and lightweight Multilayer Perceptron (MLP) classifier within a custom SER framework. Furthermore, we introduce a novel adaptive quantization scheme based on layer importance to reduce model size. This method balances model compression and performance by adaptively selecting bit-width precision for each layer based on its importance, ensuring the quantized model maintains accuracy within an acceptable threshold. Unlike previous mixed-precision methods, which are often complex and costly, our approach is both interpretable and efficient. Our model is evaluated on the benchmark SER datasets, focusing on features such as Mel-Frequency Cepstral Coefficient (MFCC), Chroma, and Mel-spectrogram. Our experiments show that our quantization scheme achieves performance comparable to state-of-the-art methods while significantly reducing model size, making it well-suited for lightweight devices.
Submission Number: 59
Loading