Abstract: Obtaining accurate probabilistic forecasts is an important operational challenge in many applications, perhaps most obviously in energy management, climate forecasting, supply chain planning, and resource allocation. In many of these applications, there is a natural hierarchical structure over the forecasted quantities; and forecasting systems that adhere to this hierarchical structure are said to be coherent. Furthermore, operational planning benefits from accuracy at all levels of the aggregation hierarchy. Building accurate and coherent forecasting systems, however, is challenging: classic multivariate time series tools and neural network methods are still being adapted for this purpose. In this paper, we augment an MQForecaster neural network architecture with a novel deep Gaussian factor forecasting model that achieves coherence by construction, yielding a method we call the Deep Coherent Factor Model Neural Network (DeepCoFactor) model. DeepCoFactor generates samples that can be differentiated with respect to model parameters, allowing optimization on various sample-based learning objectives that align with the forecasting system's goals, including quantile loss and the scaled Continuous Ranked Probability Score (CRPS). In a comparison to state-of-the-art coherent forecasting methods, DeepCoFactor achieves significant improvements in scaled CRPS forecast accuracy, with gains between 4.16 and 54.40%, as measured on three publicly available hierarchical forecasting datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=B2LY1P3vrx
Changes Since Last Submission: Dear Action Editor and Reviewers,
We are pleased to resubmit our paper. This is a resubmission of a paper with the same title which had submission number 1407, and which we withdrew for major revision. We have made significant improvements to our DeepCoFactor model and addressed the feedback received from the previous submission. These enhancements have resulted in better empirical outcomes. The main modifications to the paper compared to the prior version are as follows:
### Clarifications - Section 1
- We removed the exchangeability assumption which was confusing to the reviewers, and is not required for our DeepCoFactor model.
- We enunciate hierarchical coherence property and demonstrate how our coherent aggregation approach satisfies it. Coherent aggregation combines two aspects of our method: 1) modeling correlations between bottom-level series, and 2) aggregating samples from these series, allowing us to provide sample forecasts for the aggregates. For more details, please refer to Definition 1.2 in Section 1.
### Model improvements - Section 2
- We simplified the factor model probability in favor of Gaussian factors (see Section 2.1).
- We confirmed the benefits of the CRPS learning objective estimation (see Ablation Studies in Section 3.3). By optimizing CRPS, we align the learning objective with the evaluation metric, making the probabilistic model estimation robust to mis-specification.
- We added a CrossSeriesMLP layer that adapts the behavior of DeepCoFactor into a vector autoregressive (VAR) forecast architecture. The CrossSeriesMLP captures Granger causality, greatly improving performance on the Traffic dataset (see Section 2.2)
### Empirical Results - Section 3.2
- Our DeepCoFactor model improves on the best baselines by 27.67%, 4.16%, and 54.40% on Favorita, Tourism-Large, and Traffic, respectively, for the CRPS metric.
- Our DeepCoFactor model improves on the best baselines by 22.19%, 17.03% and 96.31% on Favorita, Tourism-Large and Traffic respectively for the relative squared error metric.
- We augmented the empirical results with the evaluation of mean forecasts at all hierarchical levels using the relative squared error as requested.
### Ablation studies - Section 3.3
- We study the effects of DeepCoFactor ‘s learning objective, comparing CRPS and log likelihood as requested. We also compared using different distribution assumptions. See Section 3.3, and Figure 3(a). Training using CRPS improves results by 60% over optimizing log likelihood.
- We performed an ablation study on our novel CrossSeriesMLP vector autoregressive layer. This layer improves results by 66% compared to not using it on the Traffic dataset, likely due to the presence of Granger-causal relationships in the traffic intersections.
Assigned Action Editor: ~Mark_Coates1
Submission Number: 3148
Loading