Forking Sequences

Published: 22 Sept 2025, Last Modified: 25 Nov 2025ScaleOPT PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: time series, forecasting, optimization, gradient stability, parallel encoding
TL;DR: We formalize the forking-sequences approach and validate it on 16 datasets, showing more stable and consistent gradient updates during training, lower forecast variance through ensembling, and improved inference computational efficiency.
Abstract: While accuracy is a critical forecasting requirement, an equally important yet often overlooked aspect is forecast stability across forecast creation dates (FCDs). Even highly accurate models can produce erratic revisions between FCDs, undermining stakeholder trust and disrupting downstream decision-making. To improve forecast stability, Amazon’s production models (MQCNN, MQT, and SPADE), employ a little-known but highly effective technique: forking-sequences. Unlike standard statistical and neural forecasting methods that treat each FCD independently, forking-sequences jointly encodes and decodes the entire time series across all FCDs, in a way mirroring time series cross-validation. In this work, we formalize the forking-sequences approach and make a case for its broader adoption by demonstrating three key benefits: (i) more stable and consistent gradient updates during training; (ii) reduced forecast variance through ensembling; and (iii) improved inference computational efficiency. We validate forking-sequences' benefits using 16 datasets from the M1, M3, M4 and Tourism competitions, improving forecast accuracy with respect statistical baselines and showing improved forecast stability, on average, by 28.8%, 28.8%, 37.9%, 31.3%, and 8.8% for MLP, RNN, LSTM, CNN, and Transformer-based architectures, respectively.
Submission Number: 9
Loading