DMSNet: A Lightweight and Efficient Facial Expression Recognition Model for IoT and WoT Applications

Hemraj Singh, Mridula Verma, Ramalingaswamy Cheruku

Published: 28 Jun 2025, Last Modified: 28 Jan 2026Web Conference 2025EveryoneCC BY 4.0

Abstract: Facial expression recognition (FER) plays a crucial role in computer vision, driving advancements in gesture recognition, patient monitoring, and human-robot interaction. Despite its potential, traditional FER methods struggle with geometric variations in facial expression features within static images, often leading to an imbalance between model performance and network complexity. This results in increased computational demands, hindering their deployment in resource-constrained environments such as the Internet of Things (IoT) or Web of Things (WoT) and mobile. To address these challenges, we propose the Deformable Multi-ScaleNetwork (DMSNet), a lightweight and efficient model specifically designed to capture multi-scale geometric variations in spatial expression features dynamically using depthwise separable convolution, deformable convolution and Receptive Field Blocks (RFBs) while minimizing parameters. It is highly suitable for real-time applications with just 5.6 million parameters, 100.8 million floating point operations, and 30 frame-per-second (FPS) inference speeds. Extensive experiments on five benchmark datasets demonstrate superior performance compared to state-of-the-art (SOTA) models for FER.