UnitNorm: Rethinking Normalization for Transformers in Time Series

02 Mar 2026 (modified: 09 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Normalization techniques are crucial for enhancing Transformer models' performance and stability in time series analysis tasks, yet we originally identify that traditional methods like batch and layer normalization often lead to issues such as token shift, attention shift, and sparse attention. We propose UnitNorm, a novel normalization approach that scales input vectors by their norms and modulates attention patterns, effectively circumventing these challenges. Grounded in existing normalization frameworks, UnitNorm's effectiveness is demonstrated across diverse time series analysis tasks, including forecasting, classification, and anomaly detection, via a rigorous evaluation on 6 state-of-the-art models and 10 datasets. UnitNorm demonstrates superior performance, particularly where robust attention and contextual understanding are vital, achieving up to a 1.46 MSE decrease in forecasting and a 4.89\% accuracy increase in classification. This work not only calls for a re-evaluation of normalization strategies in time series Transformers but also sets a new direction for enhancing model performance and stability.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Philip_K._Chan1
Submission Number: 7741
Loading