InterHandNet: Capturing Two-hand Interaction for Robust Hand-washing Activity Recognition

Yiqing Zhang, Takuya Maekawa

Published: 2025, Last Modified: 01 Mar 2026PerCom 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This study proposes a new deep learning method for hand-washing activity recognition using a series of hand skeleton data extracted from an RGB-D camera. Assessment of hand-washing activity based on recognized hand-washing steps is crucial in both industrial and medical domains, as well as in promoting healthy habits. However, recognizing hand-washing activities presents unique challenges compared to typical activity recognition for a single person due to the specific nature of hand-washing tasks. First, the steps of hand-washing can be better explained by the interaction between objects, i.e., the two hands, such as rubbing palms and fingers. Second, occlusion occurs much more frequently during hand-washing due to the frequent interaction between both hands. Therefore, we propose a new neural network tailored for hand-washing recognition called InterHandNet to address these challenges. To capture the interaction, we propose two novel modules in InterHandNet: Interaction Graph and Interaction Attention. These modules enable to exchange information across skeleton graphs of the two hands within a graph neural network framework and to focus on important keypoints in one hand by referencing the other hand through the query-key-value mechanism, respectively. To address the issue of missing data caused by occlusion, we propose Inter-hand Temporal Fusion, which fills in the missing information by referencing data from the other hand and other time steps within a time window. InterHandNet outperforms other state-of-the-art skeleton-based and RGB-based methods in terms of accuracy, and significantly surpasses RGB-based methods in runtime efficiency on edge devices.
Loading