ZeroHAR: Sensor Context Augments Zero-Shot Wearable Action Recognition

Ranak Roy Chowdhury, Ritvik Kapila, Ameya Panse, Xiyuan Zhang, Diyan Teng, Rashmi Kulkarni, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang

Published: 01 Jan 2025, Last Modified: 21 Jul 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Wearable Human Action Recognition (wHAR) uses motion sensor data to identify human movements, which is essential for mobile and wearable devices. However, traditional wHAR systems are only trained on a limited set of activities. Hence, they fail to generalize to diverse human motions, prompting Zero-Shot Learning (ZSL). Existing ZSL methods for wHAR focus solely on augmenting labels, such as representing them as attribute matrices, images, videos, or text. We propose ZeroHAR that enhances ZSL by not just focusing on activity labels, but by augmenting motion data with sensor context features. Our approach incorporates information about the sensor type, the Cartesian axis of the data, and the sensor's body position, providing the model with crucial spatial and biomechanical insights. This helps the model generalize better to new actions. First, we train the model by aligning the latent space of the motion time-series with its corresponding sensor context, while distancing it from unrelated sensor contexts. Finally, we train the model using the target activity descriptions. We tested our method against eight baselines on five benchmark HAR datasets with various sensors, placements, and activities. Our model shows exceptional generalizability across 18 motion time series classification benchmark datasets, outperforming the best baselines by 262% in the zero-shot setting.