DeltaSM: Delta-Level Contrastive Learning with Mamba for Time-Series Representation

TMLR Paper6914 Authors

08 Jan 2026 (modified: 30 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Self-supervised contrastive learning offers a compelling route to transferable time-series representations in label-scarce settings. Yet existing frameworks face a persistent trade-off between preserving fine-grained local dynamics at high temporal resolution and scaling to long sequences under practical compute constraints. Convolutional encoders often require deep stacks to retain rapid transitions, whereas Transformers incur quadratic cost in sequence length, making high-resolution long-context training expensive. Recent selective state-space models such as \emph{Mamba} enable linear-time ($O(L)$) sequence modeling and offer a promising path to mitigate this bottleneck. However, their potential for \emph{general-purpose} time-series representation learning remains underexplored; to our knowledge, prior Mamba-based contrastive learners have not been evaluated on the full UCR 2018 archive (128 datasets) under a unified protocol. We propose \textbf{DeltaSM} (\emph{Delta-selective Mamba}), a self-supervised framework for univariate time series that reconciles efficiency and expressivity. DeltaSM integrates (i) a lightweight Mamba backbone, (ii) token-budget-constrained training, and (iii) a $\Delta$-level contrastive objective that counterbalances Mamba's smoothing tendency. Specifically, we apply curvature-adaptive weighting to first-order differences of the latent sequence, encouraging the encoder to emphasize informative local transitions without increasing computational cost. At inference time, we further augment the learned time-domain embeddings with explicitly extracted frequency-domain descriptors from the raw signal to improve expressivity at negligible overhead. Across all 128 UCR datasets, under \textbf{Protocol A}---a unified compute setting with a fixed number of optimization steps and a standardized downstream classifier---DeltaSM converges in seconds and achieves classification accuracy comparable to or better than strong baselines such as TS-TCC, TS2Vec, and TimesURL, using a single global configuration and a fixed pretraining-step budget (300 optimization updates per dataset). On a focused subset that includes long-sequence datasets under \textbf{Protocol B}---where baselines are allowed their recommended training budgets and hyperparameters while DeltaSM remains fixed as in Protocol A---DeltaSM reduces pretraining time by up to $184\times$ while remaining competitive. Extensive ablations confirm that curvature-based weighting is crucial for suppressing noise while capturing local dynamics, and that inference-time frequency integration provides complementary gains with minimal additional cost.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~John_Timothy_Halloran1
Submission Number: 6914
Loading