Towards On-device Foundation Models for Raw Wearable Signals

Simon A. Lee; Cyrus Tanade; Hao Zhou; Juhyeon Lee; Megha Thukral; Baiying Lu; Sharanya Arcot Desai

Towards On-device Foundation Models for Raw Wearable Signals

Simon A. Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Baiying Lu, Sharanya Arcot Desai

Published: 23 Sept 2025, Last Modified: 30 Oct 2025TS4H NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Wearables, Foundation Models, Convolutional Neural Networks, Efficiency, Inductive Bias

TL;DR: A lightweight U-Net–based masked autoencoder foundation model for wearable PPG signals that achieves competitive clinical classification accuracy while being two to three orders of magnitude more efficient than transformer baselines

Abstract: We propose a lightweight foundation model for wearable signals that leverages convolutional inductive biases within a masked autoencoder and U-Net CNN backbone. By explicitly encoding temporal locality and multi-scale structure, our approach aligns more naturally with the nonstationary dynamics of physiological waveforms than attention-based transformers. Pretrained on 80k hours of photoplethysmogram (PPG), the model matches or surpasses larger state-of-the-art baselines across ten clinical classification tasks. At the same time, it achieves two to three orders of magnitude reductions in parameters (0.31M vs. 110M), memory footprint (3.6 MB vs. 441.3 MB), and compute, while delivering substantial speedups ($\sim$4× CPU, $\sim$20× GPU) with resolution flexibility. Together, these results establish compact convolutional self-supervised models as both scientifically aligned and practically scalable foundations for potential real-time on-device healthcare monitoring.

Submission Number: 45

Loading