Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series

Yuhan Wu; Xiyu Meng; Yang He; Junru Zhang; Haowen Zhang; Yabo Dong; Dongming Lu

Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series

Yuhan Wu, Xiyu Meng, Yang He, Junru Zhang, Haowen Zhang, Yabo Dong, Dongming Lu

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Learning semantic-rich representations from unlabeled time series data with intricate dynamics is a notable challenge. Traditional contrastive learning techniques predominantly focus on segment-level augmentations through time slicing, a practice that, while valuable, often results in sampling bias and suboptimal performance due to the loss of global context. Furthermore, they typically disregard the vital frequency information to enrich data representations. To this end, we propose a novel self-supervised general-purpose framework called Temporal-Frequency and Contextual Consistency (TFCC). Specifically, This framework first performs two instance-level augmentation families over the entire series to capture nuanced representations alongside critical long-term dependencies. Then, TFCC advances by initiating dual cross-view forecasting tasks between the original series and its augmented counterpart in both time and frequency dimensions to learn robust representations. Finally, three specially designed consistency modules —temporal, frequency, and temporal-frequency— aid in further developing discriminative representations on top of the learned robust representations. Extensive experiments on multiple benchmark datasets demonstrate TFCC's superiority over the state-of-the-art classification and forecasting methods and exhibit exceptional efficiency in semi-supervised and transfer learning scenarios. Code, data, and model checkpoints will be released after the review period.

Primary Subject Area: [Engagement] Emotional and Social Signals

Relevance To Conference: Time series are pivotal in analyzing multimedia signals. Our primary objective is to leverage contrastive learning techniques to extract temporal and frequency-domain information, thereby acquiring a universal representation of time series data, which can be applied in downstream time series classification and forecasting tasks. For instance, it aids in classifying signals like human action recognition, epilepsy, and Sleep-EDF from public datasets, as well as forecasting social signals like weather patterns and electricity usage. This approach furnishes effective analytical tools and methodologies for tasks involving feature extraction, classification, and forecasting within the realm of multimedia signals.

Supplementary Material: zip

Submission Number: 2538

Loading