Unveiling Signal Property Usage in Transformers for Time Series Classification

Published: 22 Sept 2025, Last Modified: 27 Nov 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Classification, Transformer Models, Signal Properties, Interpretability
Abstract: Transformers\cite{vaswani2017attention} are increasingly applied to Time Series Classification (TSC) for modelling long-term dependencies. Transformer variants for TSC include TARNet \cite{chowdhury2022tarnet}, GTN \cite{liu2021gated} and TrajFormer \cite{liang2022trajformer}. TARNet jointly optimises classification and reconstruction to preserve temporal dynamics. GTN uses gating to capture inter-variable dependencies in multivariate series, and TrajFormer incorporates spatial-aware attention for trajectory data. Despite these innovations, it remains unclear whether these models, including the vanilla Transformer, effectively utilise fundamental signal properties which are critical for TSC. This paper introduces TranSPAN, Transformer Signal Property ANalysis, a novel framework for analyzing how Transformer models—both standard and time series–specialized—capture key signal properties in TSC, including frequency, amplitude, phase, trend, seasonality, and sharp transitions. For a given Transformer-based TSC (TTSC) model, we first train a decoder to reconstruct the input signal from the encoder output. We then iteratively mask individual attention heads (AHs), reconstruct the signal, and measure the drop in each signal property relative to the original input. See Figure\ref{fig:overall_model_architecture}. Property changes are quantified using a discrete wavelet transform (Daubechies-4, level-3). Each AH is assigned a property attribution score based on these differences, with drops exceeding $20\%$ indicating that the AH is responsible for extracting the corresponding property. Using this systematic approach, TranSPAN enables the evaluation of how TTSC models internalize core signal properties across layers, datasets, architectures, training configurations, and preprocessing strategies. We evaluate four TTSC models, vanilla Transformer, TrajFormer, TARNet, and GTN, across three benchmark datasets, UCI HAR, ECG5000, and FordA, using TranSPAN. Vanilla Transformers show minimal drops across properties, indicating weak utilisation of signal characteristics. TrajFormer exhibits moderate improvements, especially in frequency, amplitude, and trend, but remains insensitive to seasonality. TARNet, captures frequency, amplitude, trend, and sharp transitions more effectively. GTN consistently shows the highest and most uniform property drops, reflecting strong attention to both local (amplitude, sharp transitions) and global properties (trend, phase, seasonality). Layer-wise analysis reveals a hierarchical learning pattern: lower layers predominantly specialize in local properties (amplitude, sharp transitions, frequency), while deeper layers encode more complex global properties (trend and seasonality). Dataset-specific characteristics further influence model sensitivity, with FordA emphasizing sharp transitions, ECG5000 enhancing frequency and trend, and UCI HAR supporting moderate trend and seasonality. Training dynamics also modulate learning: smaller batch sizes improve responsiveness to fine-grained local properties, whereas larger batches foster more stable acquisition of global properties. Preprocessing choices, such as derivative-based transformations, strengthen sensitivity to frequency, phase, and sharp transitions but slightly reduce amplitude sensitivity. Collectively, these findings highlight that specialized architectures, together with carefully tuned training and preprocessing strategies, enable Transformers to exploit fundamental signal properties more effectively, thereby enhancing interpretability in time series classification.
Submission Number: 201
Loading