Temporal horizons in forecasting: a performance-learnability trade-off

Pau Vilimelis Aceituno; Jack William Miller; Noah Marti; Youssef Farag; Victor Boussange

Temporal horizons in forecasting: a performance-learnability trade-off

Pau Vilimelis Aceituno, Jack William Miller, Noah Marti, Youssef Farag, Victor Boussange

Published: 02 Oct 2025, Last Modified: 02 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: When training autoregressive models to forecast dynamical systems, a critical question arises: how far into the future should the model be trained to predict for optimal performance? In this work, we address this question by analyzing the relationship between the geometry of the loss landscape and the training time horizon. Using dynamical systems theory, we prove that loss minima for long horizons generalize well to short-term forecasts, whereas minima found on short horizons result in worse long-term predictions. However, we also prove that the loss landscape becomes rougher as the training horizon grows, making long-horizon training inherently challenging. We validate our theory through numerical experiments and discuss practical implications for selecting training horizons. Our results provide a principled foundation for hyperparameter optimization in autoregressive forecasting models.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We want to thank the reviewers for their feedback, and we believe that they made our paper better. There seemed to be some disagreement regarding the relationship between ARNNs and RNNs. After the first round of reviews we provide a simple proof that the results that we initially presented for ARNNs extend to RNNs (according to our initial definition). We have now adapted the text so that we refer to AR models in general for our theory (whether they have feedback loops in the hidden state and/or context windows), and specify that the experiments are done with AR models implemented as feed-forward neural networks. We believe that in the current version there is no conflict regardless of the definition of RNNs and AR, and it can be more impactful as it provides a first step to apply our theory to Transformers and RNNs. **First review** We have now addressed the various points raised by the reviewers: - We have clarified what our key theoretical contributions are, focusing on the geometry of the loss landscape rather than the simpler aspects of gradient growth, and noting that we focus on feedforward autorregressive neural networks. - We provided a proof that extends our initial ARNN results to RNNs and transformers. - We have added a third real time series from finance, which is generally considered to be stochastic, and not stationary, thus falls outside of the range of our assumptions. - We have clarified the problems with using grid search for finding the optimal training horizon, and put our algorithm for iterative T increases in that context. - We included references where ARNNs surpass RNNs. We believe that our manuscript has improved significantly, and it is now ready for publication. Final review: We clarified what mathematical tools are new and which are well known, and cleared a few minor spelling errors.

Assigned Action Editor: ~Andreas_Lehrmann1

Submission Number: 5096

Loading