Keywords: time-series, JEPA, self-supervised Learning, representation learning
TL;DR: TF-JEPA is a noncontrastive self-supervised method using predictive alignment of time-frequency views for multivariate time series, enhancing transfer learning and reducing computational cost versus traditional contrastive methods.
Abstract: Learning generalizable representations from multivariate time series is challenging due to complex temporal dynamics, distribution shifts, and the difficulty of effectively designing contrastive pairs. We introduce TF-JEPA, a noncontrastive self-supervised method that leverages predictive alignment to integrate representations from the time and frequency domains without relying on negative sampling. Specifically, TF-JEPA utilizes dual online encoders for time and frequency domains, each paired with its own momentum-updated target encoder, embedding both views into a stable and unified latent space. Unlike conventional contrastive methods, this predictive approach enables full end-to-end fine tuning for downstream adaptation. Experimental results on diverse real world datasets, including sleep EEG classification, gesture recognition, mechanical fault detection, and biosignal-based muscle response classification, demonstrate that TF-JEPA matches or surpasses contrastive and time frequency consistency baselines. TF-JEPA improves macro F1 scores by up to 8.6 percentage points while also reducing GPU memory consumption by approximately 35%. These findings illustrate the promise of predictive alignment as a broadly applicable and modality agnostic framework for self supervised learning beyond traditional contrastive methods.
Supplementary Material: zip
Primary Area: learning on time series and dynamical systems
Submission Number: 21041
Loading