mmCLIP: Boosting mmWave-based Zero-shot HAR via Signal-Text Alignment

Qiming Cao, Hongfei Xue, Tianci Liu, Xingchen Wang, Haoyu Wang, Xincheng Zhang, Lu Su

Published: 01 Jan 2024, Last Modified: 06 Nov 2025SenSys 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Millimeter-wave (mmWave) based human activity recognition (HAR) systems have demonstrated promising performance in various applications, leveraging the power of deep neural networks. However, these systems are suffering from the scarcity of available mmWave data for model training. To address this challenge, we explore the possibility of transferring knowledge from large AI models built on massive text and visual data to enhance the generalizability of mmWave-based HAR models. Towards this end, we introduce mmCLIP, a novel system that aligns mmWave signal space and text space to facilitate zero-shot recognition for unseen activities. To enable this alignment, we employ cross-modality signal synthesis to augment mmWave signal data using large human mesh datasets and design an activity attribute decomposition and recomposition approach to characterize the semantic interconnections among activities. We conducted extensive experiments to demonstrate the effectiveness of our proposed framework.