Abstract: Time series forecasting plays a crucial role in many real-world applications, and numerous complex forecasting models have been proposed in recent years. Despite their architectural innovations, most state-of-the-art models report only marginal improvements—typically just a few thousandths in standard error metrics. These models often incorporate complex data embedding layers, which typically transform raw inputs into higher-dimensional representations to enhance accuracy. But are data embedding techniques actually effective in time series forecasting? Through extensive ablation studies across fifteen state-of-the-art models on multiple benchmark datasets, we find that removing data embedding layers from many state-of-the-art models does not degrade forecasting performance—in many cases, it improves both accuracy and computational efficiency. The gains from removing embedding layers often exceed the performance differences typically reported between competing state-of-the-art models.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We revised the paper substantially based on reviewer feedback:
- Added new experiments on three non-ETT datasets: Exchange, Weather, and National Illness.
- Expanded efficiency reporting: wall-clock breakdowns, GPU memory (peak/reserved), and inference latency/throughput.
- Clarified the embedding-removal protocol and added two tables in Appendix A.2 summarizing architectural invariances.
- Expanded the related-works section and clarified model selection rationale.
- Added background descriptions for all datasets and clarified data splits.
- Added five simpler baseline models (RNN, LSTM, GRU, ConvLSTM, BiLSTM) to complement the SOTA comparisons.
- Clarified embedding dimensionality in Section 2 and added a table summarizing actual embedding sizes used.
Assigned Action Editor: ~Devendra_Singh_Dhami1
Submission Number: 6021
Loading