IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning

Seungwhan Moon; Andrea Madotto; Zhaojiang Lin; Aparajita Saraf; Amy L. Bearman; Babak Damavandi

IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Aparajita Saraf, Amy L. Bearman, Babak Damavandi

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Language Grounding to Vision, Robotics and Beyond

Submission Track 2: NLP Applications

Keywords: Contrastive Learning, NLP Applications in Sensor Signals

TL;DR: We develop a new method to translate IMU motion sensor signals into text, allowing for novel applications in low-power use cases on wearable devices, etc.

Abstract: We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with text and video, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos -- while preserving the transitivity across these modalities. We introduce several new IMU-based Wearable AI applications such as motion-based media search, or an LM-based multimodal reasoning with motion sensor data -- all using text as the grounding platform. In addition, we show that IMU2CLIP significantly improves downstream performances when fine-tuned for each application, demonstrating its universal usage as a new pre-trained resource. Our code and models will be released publicly.

Submission Number: 4241

Loading