Scaling-laws for Large Time-series Models

Published: 10 Oct 2024, Last Modified: 26 Nov 2024NeurIPS 2024 TSALM Workshop OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: time-series, generative modeling, neural scaling laws
TL;DR: We establish, for the first time, that foundation models for time-series forecasting enjoy similar power-law scaling to LLMs and vision, with respect to model size, dataset size, and training compute.
Abstract: Scaling laws for large language models (LLMs) have provided useful guidance in training ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, with architectural details (aspect ratio and number of heads) having a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish for the first time power-law scaling with parameter count, dataset size, and training compute, spanning five orders of magnitude.
Submission Number: 90
Loading