Progressive Memory Transformers: Memory-Aware Attention for Time Series

Progressive Memory Transformers: Memory-Aware Attention for Time Series

ICLR 2026 Conference Submission13829 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time-series analysis, Statefull transformers, Contrastiive Time-Series Learning

TL;DR: A hierarchical contrastive learning framework that effectively captures both local and global temporal patterns in time-series analysis by using a progressive memory attention architecture.

Abstract: Self-supervised learning has become the de‑facto strategy for time‑series domains where labeled data are scarce, yet most existing objectives emphasize \emph{either} local continuity \emph{or} global shape, seldom both. We introduce \textbf{Progressive Memory Transformer} (PMT), a lightweight transformer backbone that maintains a writeable memory bank across overlapping windows, allowing representations to accumulate evidence from short, medium, and long horizons without re‑reading the entire sequence. On top of our proposed memory-aware attention, we formulate a hierarchical contrastive protocol that aligns embeddings at three complementary granularities---tokens, windows, and full sequences---through a token-window Gaussian loss, a memory‑state loss, and a global \texttt{[CLS]} loss. Together, PMT and these multi‑scale objectives yield a task‑agnostic model for time‑series data, providing strong features even when only $1$--$5\%$ of labels are available. We validate the approach on seven UCR/UEA/UCI benchmarks on classification tasks.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 13829

Loading