Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings

Published: 01 Mar 2026, Last Modified: 11 Apr 2026ICLR 2026 TSALM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Presentation Attendance: No, we cannot present in-person
Keywords: Time Series, Representation Learning, Self-Supervised Learning, JEPA, Multimodal, Channel-Aware
Abstract: We introduce CHARM (Channel-Aware Representation Model), a multimodal architecture for self-supervised time series representation learning that incorporates channel-level textual descriptions into both temporal convolutional and attention layers. This enables the model to reason about sensor identity and inter-channel relationships while remaining invariant to channel ordering. Trained with a Joint Embedding Predictive Architecture (JEPA), CHARM learns temporally stable, noise-robust embeddings by predicting in latent space rather than reconstructing raw signals. Across classification, forecasting, and anomaly detection benchmarks, CHARM's frozen embeddings with a lightweight linear probe match or outperform significantly larger task-specific foundation models.
Track: Research Track (max 4 pages)
Submission Number: 20
Loading