Leveraging Shared Prototypes for a Multimodal Pulse Motion Foundation Model

Leveraging Shared Prototypes for a Multimodal Pulse Motion Foundation Model

ICLR 2026 Conference Submission20464 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: biosignals, time-series, foundation model, multimodal learning, prototype-based learning, self-supervised learning

TL;DR: We demonstrate that aligning heterogeneous wearable sensor modalities, such as PPG and accelerometry, through a shared dictionary of learned prototypes is an effective pre-training strategy.

Abstract: Modeling multi-modal time-series data is critical for capturing system-level dynamics, particularly in biosignals where modalities such as ECG, PPG, EDA, and accelerometry provide complementary perspectives on interconnected physiological processes. While recent self-supervised learning (SSL) advances have improved unimodal representation learning, existing multi-modal approaches often rely on CLIP-style contrastive objectives that overfit to easily aligned features and misclassify valid cross-modal relationships as negatives, resulting in fragmented and non-generalizable embeddings. To overcome these limitations, we propose ProtoMM, a novel SSL framework that introduces a shared prototype dictionary to anchor heterogeneous modalities in a common embedding space. By clustering representations around shared prototypes rather than explicit negative sampling, our method captures complementary information across modalities and provides a coherent “common language” for physiological signals. In this work, we focus on developing a Pulse Motion foundation model with ProtoMM and demonstrate that our approach outperforms contrastive-only and prior multimodal SSL methods, achieving state-of-the-art performance while offering improved interpretability of learned features.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 20464

Loading