Hand Gesture Recognition Using a Multi-modal Deep Neural Network

Published: 01 Jan 2024, Last Modified: 23 Oct 2024Intelligent Information Processing (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As devices around us get more intelligent, new ways of interacting with them are sought to improve user convenience and comfort. While gesture-controlled systems have existed for some time, they either use additional specialized imaging equipment, require unreasonable computing resources, or are simply not accurate enough to be a viable alternative. In this work, a reliable method of recognizing gestures is proposed. The built model correctly classifies hand gestures for keyboard typing based on the activity captured by an ordinary camera. Two models are initially developed for classifying video data and classifying time-series sequences of the skeleton data extracted from a video. The models use different strategies of classification and are built using lightweight architectures. The two models are the baseline models which are integrated to form a single multi-modal model with multiple inputs, i.e., video and time-series inputs, to improve accuracy. The performances of the baseline models are then compared to the multimodal classifier. Since the multimodal classifier is based on the initial models, it naturally inherits the benefits of both baseline architectures and provides a higher testing accuracy of 100% compared to the accuracy of 85% and 75% for the baseline models respectively.
Loading