Rethinking Transformer Inputs for Time-Series via Neural Temporal Embedding

ICLR 2026 Conference Submission10879 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Time Series Forecasting, Positional Encoding Elimination, Neural Temporal Embedding(NTE)
TL;DR: We propose Neural Temporal Embedding (NTE), a simple input mechanism that removes value embedding and positional encoding, showing that Transformers for time series can improve performance through simple input enhancements.
Abstract: Transformer-based models, originally introduced in the field of natural language processing (NLP), have recently demonstrated strong performance in time-series forecasting. Due to the order-agnostic nature of the attention mechanism, these models have relied on positional encoding (PE) to capture temporal information. However, recent studies have reported that simple linear models can outperform complex Transformer architectures, and other works have also shown that modifying the Transformer input design can improve performance. Motivated by this issue, we propose Neural Temporal Embedding (NTE), an embedding mechanism that effectively internalizes temporal dependencies without relying on either value embedding or positional encoding. NTE leverages simple neural modules such as Conv1D and LSTM to independently process each variable’s time-series and directly learn temporal patterns. As a result, it removes the need for linear projection for value embedding and positional encoding in the input stage, thereby enabling the model to simultaneously achieve architectural flexibility and competitive performance. Experimental results on standard benchmarks including ETT, ECL, and Weather show that the proposed NTE-based models match or outperform state-of-the-art Transformer variants, particularly maintaining stable accuracy in long-horizon forecasting. These empirical findings show that Transformer-based models for time-series forecasting can achieve performance improvements through simple input enhancements without complex architectural modifications, thereby suggesting new possibilities for simpler and more generalizable input architectures.
Primary Area: learning on time series and dynamical systems
Submission Number: 10879
Loading