CM-ASAP: Cross-Modality Adaptive Sensing and Perception for Efficient Hand Gesture Recognition

Soheil Hor, Mostafa El-Khamy, Yanlin Zhou, Amin Arbabian, SukHwan Lim

Published: 01 Jan 2024, Last Modified: 09 Oct 2025MIPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hand Gesture Recognition (HGR) is an important component of multimedia solutions and human-computer interaction systems. Utilizing multimodal information, e.g. from both the camera and depth sensors, to improve the recognition accuracy, usually comes at the cost of additional power consumption. In this paper, we introduce a novel framework for cross-modality adaptive sensing and perception (CM-ASAP) using multimodal Deep Neural Networks (DNNs). Inspired by human cognition and multimodal perception, our proposed framework dynamically allocates computation and sensing resources between different modalities to optimize the trade-off between efficiency and accuracy of multimodal HGR DNNs. CM-ASAP accounts for the dominance of sensor power consumption over computational costs, as well as for the variance in the effectiveness of modality fusion. Furthermore, our proposed CM-ASAP takes advantage of potential early gesture classification based on initial frames.