SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series

Qitai Tan; Yiyun Chen; Mo Li; Ruiwen Gu; Yilin Su; Xiao-Ping Zhang

SynTSBench: Rethinking Temporal Pattern Learning in Deep Learning Models for Time Series

Qitai Tan, Yiyun Chen, Mo Li, Ruiwen Gu, Yilin Su, Xiao-Ping Zhang

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Time series forecasting, Synthetic data-based evaluation framework, Controlled synthetic environments

TL;DR: A Fine-grained Capability Assessment Framework for Time Series Forecasting Models Using Synthetic Data

Abstract: Recent advances in deep learning have driven rapid progress in time series forecasting, yet many state-of-the-art models continue to struggle with robust performance in real-world applications, even when they achieve strong results on standard benchmark datasets. This persistent gap can be attributed to the black-box nature of deep learning architectures and the inherent limitations of current evaluation frameworks, which frequently lack the capacity to provide clear, quantitative insights into the specific strengths and weaknesses of different models, thereby complicating the selection of appropriate models for particular forecasting scenarios. To address these issues, we propose a synthetic data-driven evaluation paradigm, SynTSBench, that systematically assesses fundamental modeling capabilities of time series forecasting models through programmable feature configuration. Our framework isolates confounding factors and establishes an interpretable evaluation system with three core analytical dimensions: (1) temporal feature decomposition and capability mapping, which enables systematic evaluation of model capacities to learn specific pattern types; (2) robustness analysis under data irregularities, which quantifies noise tolerance thresholds and anomaly recovery capabilities; and (3) theoretical optimum benchmarking, which establishes performance boundaries for each pattern type—enabling direct comparison between model predictions and mathematical optima.Our experiments show that current deep learning models do not universally approach optimal baselines across all types of temporal features.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/TanQT24/SynTSBench

Code URL: https://github.com/TanQitai/SynTSBench

Supplementary Material: zip

Primary Area: Evaluation (e.g., data collection methodology, data processing methodology, data analysis methodology, meta studies on data sources, extracting signals from data, replicability of data collection and data analysis and validity of metrics, validity of data collection experiments, human-in-the-loop for data collection, human-in-the-loop for data evaluation)

Submission Number: 1751

Loading