Abstract: Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behavior expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors—such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders—which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the Multi-scale Temporal Network (MSTN), a hybrid neural architecture grounded in an Early Temporal Aggregation principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze–excitation and a single dense layer to dynamically reweight and fuse multi-scale representations. Importantly, MSTN applies early temporal aggregation immediately after encoding, ensuring that all subsequent refinement and prediction modules operate in constant time O(1) with respect to sequence length, while the front-end encoder retains its original complexity (O(L²) for Transformer, O(L) for BiLSTM). This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering imputation, long-term forecasting, classification, and cross-dataset generalization, MSTN achieves state-of-the-art performance, establishing new best results on 21 of 27 datasets, while remaining lightweight (∼0.40M params for MSTN-BiLSTM and ∼1.06M for MSTN-Transformer) and suitable for low-latency inference (<1 sec, often in milliseconds), resource-constrained deployment. Code: https://github.com/SumitPTW/MSTN
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We thank the action editor and reviewers for their constructive feedback. Key changes:
1. O(1) complexity clarified throughout: only downstream refinement and prediction modules after ETA operate in O(1); front-end encoder retains original complexity (O(L²) for Transformer, O(L) for BiLSTM). Updated in abstract, introduction, contributions, and complexity discussion.
2. ETA language toned down: acknowledged that w/o ETA can be competitive (e.g., PEMS-SF). Language now reflects that ETA is "often beneficial" rather than "universally superior."
3. t-SNE overstatements removed: replaced phrases like "conclusively demonstrate" with "indicate," "lossless compression" with "effective compression," and "no advantage" with "little advantage." The analysis is now presented as suggestive evidence.
4. A public code repository with concrete instructions is provided.
The revised camera-ready manuscript is uploaded.
Code: https://github.com/SumitPTW/MSTN
Assigned Action Editor: ~Markus_Lange-Hegermann1
Submission Number: 7524
Loading