Abstract: Probabilistic predictions are probability distributions over the set of possible outcomes. Such predictions quantify the uncertainty in the outcome, making them essential for effective decision making. By combining multiple predictions, the information sources used to generate the predictions are pooled, often resulting in a more informative forecast. Probabilistic predictions are typically combined by linearly pooling the individual predictive distributions; this encompasses several ensemble learning techniques, for example. The weights assigned to each prediction can be estimated based on their past performance, allowing more accurate predictions to receive a higher weight. This can be achieved by finding the weights that optimise a proper scoring rule over some training data. By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), we illustrate that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem. This permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains. This result also holds for other combination strategies, and we additionally study a flexible generalisation of the linear pool that overcomes some of its theoretical limitations, whilst allowing an efficient implementation within the RKHS framework. These approaches are compared in an application to operational wind speed forecasts, where this generalisation is found to offer substantial improvements upon the traditional linear pool.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: The major changes that have been made since the initial submission are listed below. Further details are provided in the responses to the reviewers' comments.
### Changes
- We provide further information regarding the setting of functional weights in Section 4.2. We demonstrate how this can be performed in the framework of our Proposition 2 if the weight functions are defined as linear combinations of basis functions, in which case the optimisation problem again becomes quadratic in terms of the basis function coefficients. This is explored in detail in a new appendix, Appendix B.
- We present additional results when other kernels are used to define the loss function to estimate the combination weights. In addition to the energy kernel (corresponding to the CRPS or energy score), we consider results for popular kernels including the Gaussian (squared exponential), Laplacian (exponential), Matern, and Inverse Multiquadric kernels. We confirm empirically that when the CRPS (energy score) is used to evaluate forecast performance, then the optimal forecasts are obtained when the linear pool weights are estimated by minimising the CRPS (energy score) in the training data. These results are shown in Section 5.4. We additionally study the behaviour of the weights estimated using different kernel scores, and find that this is relatively insensitive to the kernel in the application presented here. These additional results are shown in Appendix C.
- We more explicitly mention the connection between the proposed framework and kernel ridge regression. In particular, we discuss how the resulting forecast distribution or the estimated weight vector could be regularised by adding an additional term to the average score, which again results in a convex quadratic optimisation problem, as in kernel ridge regression. This discussion can be found in a new remark, Remark 3, after Proposition 2.
Assigned Action Editor: ~Krikamol_Muandet1
Submission Number: 4145
Loading